Sunday, November 5, 2017

#classifier
another way to do kmeans : cexamples/classifier.c
but unit-tests are missing -sigh

Yesterday we discussed virtualization that is helpful to visualize data. In fact visualization is an important functional area for software development and many tools are written and developed to find knowledge in vast sets of data.
Today we explore data visualization. This is what distinguishes Data Mining from machine learning.
While machine learning uses concepts such as supervised and unsupervised classifiers, it can be understood as a set of algorithms. Data Mining on the other hand uses those and other algorithms in conjunction with a database so that the data can be queried to yield the result set that summarizes the findings. These result sets can then be drawn on charts and represented on dashboards.
Yet data mining and machine learning are separate domains in themselves. Machine learning may find use with text analysis and images and other static data that is not represented in tables. Data Mining on the other than translates most data into something that can be stored in a database and this has worked well for organizations that want to safeguard their data. Moreover, we can view the difference as top down and bottoms up view as well. For example, when we use statistics for building a regression model, we are binding different parameters together to mean something together and tuning it with experimental data. An unsupervised machine learning algorithm on the other hand builds a decision tree classifier based on the data as it is made available.  The output from a machine learning algorithm may be input for a data mining process. Some of the machine learning algorithms are forms of batch processing while data mining techniques may be applied in a streaming manner.
Both data mining and machine learning have been domain specific such as in finance, retail or telecommunications industry These tools integrate the domain specific knowledge with data analysis techniques to answer usually very specific queries.
Tools are evaluated on data types, system issues, data sources, data mining functions, coupling with a database or data warehouse, scalability, visualization and user interface.  Among these visual data mining is popular for its designer style user interface that renders data, results and process in a graphical and usually interactive presentation.
Visualization tools such as graphana stack for viewing elaborate charts and eye candies only require read permissions on the data as they execute queries on the result to fetch the data for making the charts.


No comments:

Post a Comment