Cluster computing

Use of annotations in cloud inventory
Most public clouds have a notion of tags or labels to associate with their inventory. For example, as AWS explains, tags enable us to categorize our cloud resources in different ways, for example, by grouping, color coding and action labeling. This is useful when we have many resources of the same type — we can quickly identify a specific resource based on the tags we have assigned to it. Each tag consists of a key and an optional value that we decide and as many as meets our needs for each resource type. Using a consistent set of tag keys makes it easier for us to manage our resources. we can search and filter the resources based on the tags we add.
In private cloud, we can put this tags to much more use:
First, we can use a variety of data mining techniques to expand our analytics
Second we can expose a lot more data for each resource that we can then associate with tags.This data can be gathered and saved so we are not limited to instance creation time.
Tags can become dimensions of intent for a possible match against a term used for search. These intentions can then be used as lines of search
Tags can generate more tags.
This write up describes some of these techniques
Let us first describe the tags. Tags don't have any semantic meaning to the functional aspects of the resource and are interpreted strictly as a string of characters. Also, tags are not automatically assigned to our resources. That said, in our new use cases, we may suggest a few tags out of the box based on some common usages of our suggestions. Tags can easily be authored and managed by console, command line interface or API

We can assign tags only to resources that already exist. If we add a tag that has the same key as an existing tag on that resource, the new value overwrites the old value. We can edit tag keys and values, and we can remove tags from a resource at any time. We can set a tag's value to the empty string, but we can't set a tag's value to null. We can even control who can see these tags.

Now lets describe the new usages:

Data mining: The idea in data mining is to apply a data driven, inductive and backward technique to identifying model. Get model from learning data and check with test data, refine the model if there’s a mismatch (prediction, reality. This is different from forward deductive methods in that those build model first, then deduce conclusions and then match with data. If there’s a mismatch between the model prediction and reality, the model would then be refined.

The conceptual data model involves tables for items (instance-id, item-type-id, attribute1, attribute2, … ) and transactions ( xid, subset of item-types, customer-id, timestamp ) and item-types (item-type-id, type-name, … ).

With this data model, a family of patterns called Associations, are established. They involve subsets of similar item-types such as say customer buying item-type ITI also buys item-type IT2. Another example is sequential associations where the customers buying item-type IT1 will buy item-type

Clustering

Clustering is a technique for categorization and segmentation of tuples. Given a relation R(A1, A2, ..., An), and a similarity function between rows of R. Find a set of those groups of rows in R with the objectives that the groups should be cohesive and not coupled. The tuples within a group are similar to each other. The tuples across group are dissimilar. The constraint is that the number of clusters may be given and the clusters should be significant.

The choices for similarity measures include distance functions such as euclidean, manhattan, string edits, graph-distance etc. and with L2 metrics.

The choices for group representations are made with finding center such as with mean, medoid, (mean and standard deviation) or with boolean expression on attributes or with named subsets of rows.

In addition to these usages, tags can generate more tags. Background processing and automation can work with tags to generate more tags. For example, a clustering operation on the existing data using similarity measures on existing tags will generate more tags.

Conclusion: These tags come very useful in expanding options for analytics. They can be made better than conventional usages and applied over existing resources without any disruption.
#codingexercise

We were looking at counting the number of increasing subsequences in an earlier post. We try the same for distinct subsequences:

Int GetCount(List<int>A)

{

var dp = new int[A.Count+1]{};
var sum = new int[A.Count+1]{};
dp[0] = 1;
sum[0] = 1;

for (int I = 1; I <= A.Count; i++){

var total_dp_upto_current = 0;
var total_dp_upto_repetition = 0;

for (int j = 0; j <= i – 1;j++) {
total_dp_upto_current += dp[j];
}

if (h.Contains(A[i-1]) == true){
for (int j= 0; j <= h[A[i-1]]; j++){
total_dp_upto_repetition += dp[j];
}
}
dp[i] = total_dp_upto_current - total_dp_upto_repetition;
sum[i] = sum[i-1] + dp[i];
last[A[i-1]] = (i-1);

}

Return sum[n];

}
For example ABA has "", A, B, AA, AB,BA, ABA

Cluster computing

Tuesday, March 28, 2017

No comments:

Post a Comment