Cluster computing

Sunday, October 20, 2013

Intermediate tutorial on data mining
We continue our discussion on using Analysis Services to provide an integrated environment for creating and working with data mining models. We mentioned how we bind to data sources, create and test models and use with predictive analysis. In this tutorial, we build on the previous review to include new scenarios such as forecasting and market basket analysis.
In forecasting we create a time series model to forecast the sales of products in different regions around the world. You will develop individual models for each region and learn how to use cross prediction.
In market basket analysis, we will create an association model, to analyze groupings of the product that are purchased online so that we can recommend products to customers.
A forecasting structure is created using an existing relational database or data warehouse. The Microsoft Time Series is selected and and the SalesByRegion is selected in this case.
The input and predictable columns can then be selected followed by specifying column contents and data types. The columns can be dragged and dropped into the forecasting mining structure.
the mining models
The forecasting model can be customized with the FORECAST_METHOD and PREDICTION_SMOOTHING. The forecast method parameter controls whether the time series algorithm is optimized for short term or long term predictions. The prediction smoothing parameter combines the way short term and long term predictions are mixed. Typically a fifty fifty split works.
Missing data can also be handled.
The forecasting model can be explored with a mining model viewer. Predictions and deviations can both be viewed here.
Similarly, the market basket analysis is also similar. The mining structure is created using the wizard by specifying association rules and available data sources. We select the option to allow drill through and display the mining structure we create.
In this case, the algorithm parameters are MINIMUM_PROBABILITY and MINIMUM_SUPPORT. Typical values are 0.1 and 0.01 respectively.
The association viewer contains three tabs Rules, Itemsets, and dependency network. The dependency network allows us to investigate the interaction of the different items in the model. Each node in the viewer represents an item while the line between them represents a rule. A line connecting two items means that these items are likely to appear in a transaction together.
Associations from the model can then be used to make predictions. The is done by building prediction queries against the mining models.

Cluster computing

Sunday, October 20, 2013

No comments:

Post a Comment