Thursday, April 18, 2019

The sequence joins

Unlike relational tables, sequences are generally listed in a columnar manner. Since each sequence represents a string, the indexes on the sequences help in fast lookup on the same table. If the table is joined to itself, it helps matching sequences to each other.
Prefix trees help with sequences comparisons based on prefixes. Unlike joins, were the values have to match, prefix trees help with unrelated comparisons. Prefix trees also determine the levels of the match between the sequences and this is helpful to determine how close two sequences are.  The distance between two sequences is the distance between the leaves of the prefix trees.  This notion of similarity measure is also helpful to give a quantitative metric that can be used for clustering.
Common techniques for clustering involve assigning sequences to the nearest cluster and forming cohesive cluster by reducing the sum of squares of errors.

Besides these usual forms of representation, there is nothing preventing breaking up the sequences into an elements table with relation to the sequence table. Similarly, sequences may also have group identifiers associated with them. With this organization, a group can help in finding a sequence and a sequence can help in finding an element. With the help of relations, we can perform standard query operations in a builder pattern.

No comments:

Post a Comment