Tuesday, June 18, 2013

book review

Data Mining concepts and techniques Jiawei Han and Micheline Kamber

This book mentions the data preprocessing steps as descriptive data summarization, data cleaning, data integration and transformation, data reduction, data discretization and automatic generation of concept hierarchies.
Descriptive data summarization provides the analytical foundation for data pre-processing using statistical measures such as mean, weighted mean, median and mode for center, range, quartiles, interquartile range, variance, and standard deviation for dispersion, histograms, boxplots, quantile plots, scatter plots,  and scatter plot matrices for visual representation.
Data cleaning routines fill in missing values, smooth out noise, identifying outliers and inconsistencies in the data.
Data integration combines data from multiple sources into a coherent data store by smoothing out data conflicts, semantic heterogeneity and contribute towards data integration.
Data transformation routines convert the data into appropriate forms for mining involving steps such as normalization.
Data reduction techniques such as data cube aggregation, dimensionality reduction, subset selection and discretization can be used to obtain a reduced representation of data.
Data discretization can involve techniques such as binning, histogram analysis, entropy based discretization, cluster analysis and intuitive partitioning. Data processing methods continue to evolve due to the size and complexity of the problem.

Data Mining is the discovery of knowledge based on finding hidden patterns and associations, constructing analytical models, perform classification and prediction and presenting the mining results using visualization tools. Data Warehousing helps with providing summarized data. A data warehouse is defined in this book as a subject-oriented, integrated, time-variant, and non-volatile collection of data organized for decision making. A multidimensional data model is used to design the data warehouse and consists of a data cube with a large set of facts or measures  and a number of dimensions. A data cube consists of a lattice of cuboids.Concept hierarchies organize the values into levels of abstraction.
Data Mining can use OLAP queries as well as On-line analytical mining (OLAM)

No comments:

Post a Comment