Introduction: Data
mining techniques add insights to database that
are typically not known from standard query operators and IT operations rely
largely on some form of data store or CMDB for their inventory and associated
operations. OLE DB standardized data mining
language primitives and became an industry standard. Prior to OLE DB it was
difficult to integrate data mining products. If one product was written using
decision tree classifiers and another was written with support vectors and they
do not have a common interface, then the application had to be rebuilt from
scratch. Furthermore, the data that
these products analyzed was not always in a relational database which required
data porting and transformation operations.
OLEDB for DM consolidates all these. It was
designed to allow data mining client applications to consume data mining
services from a wide variety of data mining software packages. Clients
communicate with data mining providers via SQL.
The OLE DB for Data Mining stack uses a data
mining extension (DMX), a SQL like data mining query language to talk to
different DM Providers. DMX statements can be used to create, modify and work
with different data mining models. DMX also contains several functions that can
be used to retrieve statistical information.
Furthermore, the data and not just the interface is also unified. The
OLE DB integrates the data mining providers from the data stores such as a
Cube, a relational database, or miscellaneous other data source can be used to
retrieve and display statistical information.
The three main operations performed are model
creation, model training and model prediction and browsing.
Model
creation A data mining model object is created just like a relational table.
The model has a few input columns and one or more predictable columns, and the
name of the data mining algorithm to be used when the model is later trained by
the data mining provider.
Model training: The data are loaded into the model
and used to train it. The data mining provider uses the algorithm specified
during the creation to search for patterns. These patterns are the model
content.
Model prediction and browsing: A select statement
is used to consult the data mining model content in order to make model
predictions and browse statistics obtained by the model.
An example of a model can be seen with a nested
table for customer id, gender, age and purchases. The purchases are
associations between item_name and item_quantitiy. There are more than one
purchases made by the customer. Models can be created with attribute types such
as ordered, cyclical, sequence_time, probability, variance, stdev and support.
Model training involves loading the data into the
model. The openrowset statement supports querying data from a data source
through an OLE DB provider. The shape command enables loading of nested data.
When the data mining is not
sufficient in terms of grouping, ranking and sorting, AI techniques are used.
The Microsoft ML package provides:
fast linear for binary
classification or linear regression
one class SVM for anomaly
detection
fast trees for regression
fast forests for churn
detection and building multiple trees
neural net for binary and
multi-class classification
logistic regression for
classifying sentiments from feedback
#codingexercise: Canonball.docx
No comments:
Post a Comment