Thursday, June 20, 2013

Mining stream, time-series, and Sequence Data
Stream data could come from telecom, financial markets and satellite data and they are typically continuous with varying update rate, sequenced in time, fast changing and massive Their summaries are called synopses. Synopses help with answer queries on stream data with an approximation. They can include random sampling, sliding windows just like TCP, histograms, multire solutions, sketches, and randomized algorithms.
The tilted time frame model allows data to be stored at the finest granularity in the most recent time and the coarsest granularity in the most distant time.  A stream data cube can store compressed data by using the tilted time frame model on the time dimension, storing data at only some critical layers, which reflect the levels of data that are of most interest to the analyst,.and performing partial materialization based on "popular paths" through the critical layers.
Stream based operations are typically forward only. The earlier discussed methods of frequent itemset mining, classification and clustering tend to scan the data multiple times  Stream-based methods instead find approximate answers as for example with lossy counting algorithm.and CluStream algorithms for stream data clustering. A time series database consists of sequences of values or events changing with time at regular time intervals such as for weather forecasting. This database is studied with trend analysis that includes long term trend movements, cyclic movements, seasonal movements, and irregular movements. Subsequence matching is a form of similarity search that finds subsequences that are similar to a given query subsequence. Such methods match subsequence that have the same shape while accounting for gaps and differences in baseline and scale.
A sequence database consists of ordered element not necessarily based on time such as web clickstreams. Any sequence that satisfies a minimum support is frequent and these patterns are queried from the database such as for example a customer buying this also bought that. Constraint based mining of sequential pattern are user defined and help to further narrow down the patterns being searched. These constraints can be expressed in terms of the duration of the sequence, a window of time when events occur, and gaps between events. Analysis of recurring patterns is called periodicity analysis and may involve full or half periods or association rules between periods. Another example of sequence analysis is the biological sequence analysis which compares, aligns, indexes and analyzes biological sequences such as sequences of amino acids. These are of two types pairwise sequence alignment and multiple sequence alignment and usually involve dynamic programming. Common techniques to analyze biological sequences are the Markov chains and hidden Markov models. These attempt to find the probability of a symbol x in the model given the sequence of symbols.

No comments:

Post a Comment