Cluster computing

Monday, April 8, 2019

Uses of object storage with Sequences:

Sequence databases are niche storage. They do not find everyday use in commercial systems because traditional relational and non-relational databases provide the ability to store large sets of data and their indexes. Their availability in the cloud has allowed the notion of unlimited data in representations such as BigTable. Sequences however do not expand along the columns of a table. Instead they run to the order of billions in the number of rows.

The processing for large sets of rows has remained somewhat similar so traditional databases served for all data including sequences with their tables. Sequence processing however has deviated from this conventional analytical stack. Sequences involve prefix tree. The processing stages tend to prune, clean and perform canonicalization before the sequence patterns are discovered. Even a dynamic bit vector datastructure or a bloom filter is used to determine whether an element is part of a sequence or not.

Generation of sequences is also a multi-stage processing. It involves discovering elements for the sequences prior to collecting the sequences. Such extraction of elements requires cleaning, stemming and even running neural nets so that they can be weighted before they are extracted. Sequences merely help with the formation of groups. Sometimes the ordering is important and at other times they degenerate to groups.

Groups have had limited application in analysis because groups proliferated and there is no good way to determine what is important and what isn’t. There is also no easy way to tell how many elements should remain in a group or what to exclude. This makes groups difficult for analysis as opposed to vectorization of elements where their latent power and associations are easier to form patterns with data mining techniques. Statistical and other forms of analysis also prefer vectorization. Graphs and page ranks also work better on elements that are vectors rather than scalars.

However, groups do have the ability to form pseudo elements and these elements can also participate in the formation of graphs and their analysis via page ranks. The use of a search engine with the web resources is a demonstration that page ranking can assign weights to elements. With the help of groups as pseudo elements, there is some categorization which can lead to hierarchies or levels. These hierarchies or levels add value to the otherwise flat representation of resource rankings where only the top few ever get noticed and the remaining ignored.

The meaningfulness of unordered groups or ordered sequences can improve the search as well as the prediction of queries. This has been the underlying basis for collaborative filtering where users are genuinely interested in viewing items that others have viewed similar to what they were trying to find. Therefore, groups and sequences hold a lot of promise.

Object storage is a representation of infinite web accessible storage of key value collections in a hierarchical namespace that is well suited for groups and collections. With the compute resources available to access the storage directly over the web and the object storage demonstrating the best practice of the storage industry, the analysis using groups and sequences becomes much more agile with such storage and compute.

Cluster computing

Monday, April 8, 2019

No comments:

Post a Comment