Tuesday, June 13, 2017

Data architectures in Cloud Computing. 
We were discussing that traditional data processing architecture has changed a lot from where they used to be part of the ubiquitous three tier architecture involving databases, to being more distributed, scaled up and scaled out, sharded and hosted on private and public clouds, maintained on clusters and containers with shared volumes, hosted in memory and even becoming hybrid to involve SQL and NoSQL technologies. We continue reviewing some of the improvements in this field 
Today we look at data storage for social networking applications as an extreme of storage needed for this purpose. We recap the considerations presented in a video by Facebook Engineers for their data infrastructure needs. They have data arriving to the tune of several hundred Terabytes and a storage of over 300 Petabytes a fraction of which is processed. For example, their log storage flows into HDFS which is massively distributed storage. They use NoSQL Hadoop over HDFS together with Hive for data warehouse operations and SQL querying. This makes it easy for data tools and pipelines to work with this data stack.
Then they introduced Presto over HDFS for interactive analysis and Apache Giraph   over Hadoop for Graph Analytics. We will discuss Presto and Giraph shortly but let us take a look at the kind of data stored in these databases. All the images from the social networking flow into HayStack, The users are stored in mysql and all the chat is stored in H-Base. They use Scribe for Log Storage, Scuba for real-time slice and dice and Puma for Streaming analytics. These also give an indication of the major types of data processings involved with social network application.
Apache Giraph is an iterative graph processing system used for high scalability. It is used to analyze the graph formed by users and their connections.

#codingexercise
Given an array, find the maximum j-i such that arr[j] > arr[i]
int GetMaxDiff(List<int> A)
{
int diff = INT_MIN;
for (int i = 0; i < A.Length; i++)
  for (int j = A.Length -1; j > i; j--)
  {
      if (A[j] > A[i] && j-i > diff)
         diff = j-i;
  }
return diff;
}


No comments:

Post a Comment