Cluster computing

Saturday, February 23, 2019

Today we continue the discussion on the best practice from storage engineering:
496) Tags can be used to make recommendations against the data to be searched. Tags point to groups and the preferences of the group is used to make a ranked list of suggestions. This technique is called collaborative filtering. A common data structure that helps with keeping track of preferences is a nested dictionary. This dictionary could use a quantitative ranking say on a scale of 1 to 5 to denote the preferences of the participants in the selected group.

497) A useful data structure for mining the logical data model is the decision tree. Structure involves interior nodes = set (A1, … An) of categorical attributes . The leaf is the class label from domain(C). The edge is a value from domain(Ai), Ai associated with parent node. The property is a search tree. The tuples in R -> leafs in class labels . The decision tree's property is that it associates the tuples in R to the leafs i.e. class labels. The advantage of using a decision tree is that it can work with heterogeneous data and the decision boundary is parallel to the axis.

498) Clustering is a technique for categorization and segmentation of tuples. Given a relation R(A1, A2, ..., An), and a similarity function between rows of R. Find a set of those groups of rows in R with the objectives that the groups should be cohesive and not coupled. The tuples within a group are similar to each other. The tuples across group are dissimilar. The constraint is that the number of clusters may be given and the clusters should be significant.

499) Outliers are the rows that are most dissimilar. Given a relation R(A1, A2, ..., An), and a similarity function between rows of R, find rows in R which are dissimilar to most point in R. The objective is to maximize dissimilarity function in with a constraint on the number of outliers or significant outliers if given.

500) There are no containers for native support of decision tree, classified and outlier data in unstructured storage but since they can be represented in key values, they can be assigned to objects themselves or maintained in dedicated metadata.

Saturday, February 23, 2019