Cluster computing

Friday, March 17, 2017

Today we start reading the paper "Big Data and Cloud Computing: A survey of the State-of-the-Art and Research Challenges" by Skourletopoulos et al. This paper talks about the comparisons of data warehouse and big data as a cloud offering. As Gartner mentioned, there will be more than 20 billion connected devices expected by the year 2020 and the amount of data exchanged by the sensors is going to be way more than the amount of data exchanged by human beings. The size of data is only growing. Many find it easier to directly work on such large scale data Big Data refers to very large and complex data sets that traditional data sets are incapable of processing For a more detailed comparision, I refer an earlier blog post. The main takeaway is that BigData is not only about storage but also about a different type of algorithms. These load, store and query a massive scale of data in batches by a technique called MapReduce and can run in parallel across a distributed cluster. Social network is one example of Big Data. Many cloud providers have established new datacenters for hosting social networking, business media content or scientific applications and services. In fact storage from cloud providers is measured in gigabyte-month and compute cycle is priced by the CPU-hour.

IBM data scientists argue that the key dimensions of big data are : volume, velocity, variety and veracity. The size and type of existing deployments show ranges along these dimensions. Many of these deployments get data from external providers. A Big data as a service stack may get data from other big data sources, operational data stores, staging databases, data warehouses and data marts. Typically the operational datastores, staging databases and warehouses are relational data. Data marts allow analysis over dimensions along a cube. Big Data sources can include source systems in Compliance, Trading, CRM, Research, Finance, MDM, Pricing and other IoT data sources.

Zheng et al described a big data as a service offering for service generated data. He showed that the stack for this service includes all three layers of analytics, platform and infrastructure in that hierarchy. The data feeding into this service comes from service generated big-data that includes service logs, service quality of service QoS and service relationship. The log analysis comes useful for visualization and diagnosis. The QoS provides fault tolerance and prediction. The service relationship provides service identification and migration.

#codingexercise

Count all Palindromic subsequences in a given string

Int GetCountPalin(string A, int start, int end)

{

If (String.IsNullOrEmpty(A) || A.Length == 0 ) return 0;

// Assert(start >= 0 && start < A.Length && end >= 0 && end < A.Length && start <=end);

If (start == end) return 1;

Int count = 0;

If (A[start] == A[end]){

count += GetCountPalin(A, start+1, end);

count += GetCountPalin(A, start, end-1);

count += 1;

}else{

count += GetCountPalin(A, start+1, end);

count += GetCountPalin(A, start, end-1);

count -= GetCountPalin(A, start+1, end-1);

}

return count;

}

Void  Combine(string A, ref stringbuilder b, int start, int level, ref List<int> palindromecombinations)

{

for (int I =start; I < A.length; I++)

{

     b[level] = A[i];

If(IsPalindrome(b.toString()))

palindromecombinations.add(b.toString());

if (I < A.length)

    Combine(A, ref b, start+1, level+1, ref palindromecombinations);

  b[level] = '/0';

}

Cluster computing

Friday, March 17, 2017

No comments:

Post a Comment