Cluster computing

Wednesday, April 24, 2019

The case for data mining over sequences.
Large sets of sequences can be partitioned. When they are stored in horizontally partitioned tables, they can participate in data mining just the same as any other data.
While machine learning uses concepts such as supervised and unsupervised classifiers, it can be understood as a set of algorithms. Data Mining on the other hand uses those and other algorithms in conjunction with a database so that the data can be queried to yield the result set that summarizes the findings. These result sets can then be drawn on charts and represented on dashboards.
Yet data mining and machine learning are separate domains in themselves. Machine learning may find use with text analysis and images and other static data that is not represented in tables. Data Mining on the other than translates most data into something that can be stored in a database and this has worked well for organizations that want to safeguard their data. Moreover, we can view the difference as top down and bottoms up view as well. For example, when we use statistics for building a regression model, we are binding different parameters together to mean something together and tuning it with experimental data. An unsupervised machine learning algorithm on the other hand builds a decision tree classifier based on the data as it is made available. The output from a machine learning algorithm may be input for a data mining process. Some of the machine learning algorithms are forms of batch processing while data mining techniques may be applied in a streaming manner.
Both data mining and machine learning have been domain specific such as in finance, retail or telecommunications industry These tools integrate the domain specific knowledge with data analysis techniques to answer usually very specific queries.
Tools are evaluated on data types, system issues, data sources, data mining functions, coupling with a database or data warehouse, scalability, visualization and user interface. Among these visual data mining is popular for its designer style user interface that renders data, results and process in a graphical and usually interactive presentation.
Visualization tools such as grafana stack for viewing elaborate charts and eye candies only require read permissions on the data as they execute queries on the result to fetch the data for making the charts.
Some of the algorithms included with models and predictions used in data mining fall in the following categories:
Classification algorithms - for finding similar groups based on discrete variables
Regression algorithms - for finding statistical correlations on continuous variables from attributes
Segmentation algorithms - for dividing into groups with similar properties
Association algorithms - for finding correlations between different attributes in a data set
Sequence Analysis Algorithms - for finding groups via paths in sequences

1 comment:

ramizFebruary 1, 2020 at 9:07 PM
Wow i can say that this is another great article as expected of this blog.Bookmarked this site..
omnidex mining
ReplyDelete
Replies

Add comment