Outliers Mining Algorithm | Outliers are the rows that are most dissimilar. Given a relation R(A1, A2, ..., An), and a similarity function between rows of R, find rows in R which are dissimilar to most point in R. The objective is to maximize dissimilarity function in with a constraint on the number of outliers or significant outliers if given. | The steps to determine outliers can be listed as: 1. Cluster regular via K-means, 2. Compute distance of each tuple in R to nearest cluster center and 3. choose top-K rows, or those with scores outside the expected range. Finding outliers is sometimes humanly impossible because the volume of the cases is quite high. Outliers are important to discover new strategies to encompass them. If there are numerous outliers, they will significantly increase organizational costs. If they were not, then the patterns help identify efficiencies. |
Decision tree | This is probably one of the most heavily used and easy to visualize mining algorithm. The decision tree is both a classification and a regression tree. A function divides the rows into two datasets based on the value of a specific column. The two list of rows that are returned are such that one set matches the criteria for the split while the other does not. When the attribute to be chosen is clear, this works well. | A Decision Tree algorithm uses the attributes of the service requests to make a prediction such as the relief time on a case resolution. The ease of visualization of split at each level helps throw light on the importance of those attributes. This information becomes useful to prune the tree and to draw the tree |
No comments:
Post a Comment