Cluster computing

Saturday, March 18, 2017

We started reading the paper "Big Data and Cloud Computing: A survey of the State-of-the-Art and Research Challenges" by Skourletopoulos et al. This paper talks about the comparisons of data warehouse and big data as a cloud offering. IBM data scientists argue that the key dimensions of big data are : volume, velocity, variety and veracity. The size and type of existing deployments show ranges along these dimensions. Many of these deployments get data from external providers. A Big data as a service stack may get data from other big data sources, operational data stores, staging databases, data warehouses and data marts. Zheng et al showed that the service generated big data included service logs, service QoS and service relationships in the form of services identification and migration. A cloud based big data analytics service provisioning platform named CLAaaS is presented in the literature to help describe the significant features of the workflow systems, such as multi-tenancy for a wide range of analytic tools and back-end data sources, user group customizations and web collaboration. This system consists of several layers with the backend comprising of data sources such as DBMS, data warehouse, data and streams. The data from the data sources are mixed, preprocessed, filtered, aggregated, transformed and routinely operated so that it can be staged for the next layer. The staged data is then modeled. The modeling may be imperfect. Therefore, it is estimated, validated and scored. With the help of the model and with the help of the data, results can be analyzed and visualized. With the help of the results, the data may be further transformed or the model may be improved with feedback. The improvements come from result interpretations, predictions, prescriptions, action impact evaluation, visualization etc. This workflow may be familiar to many who work with graph databases because these are generally very large databases and require a similar drill. The difference is in the formalization of the data management and modeling steps. The algorithms and analysis can be similar to most NoSQL processing whether batch or streaming. While graphs require their own query language, it is likely that the industry may evolve to a standard on these dialects - courtesy Raghu Ramakrishnan

#codingexercise

Get sum of all averages of subsequences

Void  GetSumOfAllAverages(List<int> A, ref List<int> b, int start, int level, ref List<List<int>> subsequences)

{

for (int I = start; I < A.length; I++)

{

     b[level] = A[i];

subsequences.add(b.Clone());

if (I < A.length)

    GetSumOfAllAverages(A, ref b, start+1, level+1, ref subsequences);

  b[level] = '/0';

}

subsequences.Sum(x => x.Avg());

Int GetSumOfAllAvgs(List<int>A, int start, int end)

{

If (start>end) return 0;

If (start==end)return A[start];

Int sum = A.Sum();

double sumofallavgs= 0

For (int n = 1; n <=A.Count; n++)

Sumofallavgs += sum * NChoosek(A.Count-1, n-1) / n;

Return sumofavgs;

}

Int NChooseK(n, k) // this can also be a dynamic programming method although not used below

{

If (k >=0 && k <n)

{

Return Factorial(n)/(Factorial(n-k)xFactorial(k));

}else{

Return 0;

}

Cluster computing

Saturday, March 18, 2017

No comments:

Post a Comment