Cluster computing

Friday, June 23, 2017

Today we discuss the design of Whatsapp messenger. This is a messaging architecture. It offers free texting in a SMS world. There are no ads, no gimmicks, no games and no feeds. They have hundreds of nodes, thousands of cores, hundreds of terabytes of RAM, serve billions of smartphones. Their server infrastructure is based on Erlang/FreeBSD. The Erlang is a programming language that is used to build massively scalable soft real-time systems with requirements on high availability. With this programming language they send out about 70 million messages per second.
This architecture is famous for “it just works” because only a phone number is used as the address to send messages to and messages can be voice, video, images or text. It also features voice message, group chats and read-receipts.
The backend consists of Erlang, Jabber, FreeBSD, Yaws web server, PHP, BEAM, and custom XMPP. The Frontend consists of seven client platforms for the different mobile devices and SQLLite. Every user is an actor. Messages flow between actors when they are online or they are stored offline. When the actor comes online, he pulls the messages sent to him. User and actor differ in that the actor notion or proxy exists only in the Whatsapp backend framework. Message sent from user goes to actor inbox which then sends the message to the other actor who then repeatedly sends to the user behind the actor. When that user comes online, the messages appear. The Mnesia backend DB stores the actors and their inboxes. The message is encrypted with a key as a tag. They use tools for system activity monitoring on all their server operating systems as well as BEAM Processor hardware perf counters and dtrace. BEAM is the erlang VM that is used to process the messages.
The highlights of this architecture is its performance oriented design. Even similar feature from Facebook also written in erlang could not match the widespread adoption of this messenger. They have proven that Erlang and FreeBSD have unmatched SMP scalability.
#codingexercise
Let C be the cost of a string S defined as the length of the longest substring of S which has all characters same.
int GetCost(string S)
{
int max = 0;
for (int i =0; i < S.Length; i++)
{
int cost = 0;
for (int j =i; j < S.Length; j++)
{
if (S[j] != S[i]) break;
cost += 1;
}
if (cost > max)
max = cost;
}
return max;
}

Given a language L with c distinct characters and strings all of length n, find the expected value of cost as Sum(cost)/number_of_strings
void FormStringInL(List<char> c, int n, ref List<char> b, ref List<string> combinations)
{
if (b.Length == n) {combinations.Add(ConvertToString(b)); return;}
foreach(var a in c)
{
b.Add(a);
FormStringInL(c, n, ref b, ref combinations);
b.RemoveLast();
}
}
value = combinations.Sum(x => GetCost(x))/combinations.Count;
Console.WriteLine(value);
# other System Design : https://1drv.ms/w/s!Ashlm-Nw-wnWsBPfKqVUIHGkJ4Rf

#tree delete operation for node z
if either left or right child of z exists to delete for node
find the tree successor as y or set it to z
if y has a child, let it take its place and fix the parent of that child
if parent of y was null then it was root, fix root otherwise fix parents left or right
if y is not the same as z
then copy z to y
return y
Node TreeSuccessor(Node z)
(
if (z == null) return null;
if (z.right)
return TREE_MINIMUM(z.right);
Node parent = z.parent;
while(parent && parent.right == z)
{
z = parent;
parent = parent.parent;
}
return parent;
}

Thursday, June 22, 2017

We were reviewing DynamoDB and Cosmos DB from our discussion on the design of online store. Then we started discussing the design of YouTube. We now continue with this discussion.
We listed the design considerations as follows:
We keep track of Users and Videos in our inventory. Videos may also have images associated such as thumbnails. It may not be surprising to find images far exceeding the number of videos. The videos could themselves number under a billion. Images and videos are best served from a file storage cluster such as Isilon OneFS cluster although performance may need to be compared with that of partitioned datacenter storage.
User Model has information about users credentials as well as profile and statistics. These can be maintained in a relational database and can be served from the cloud with a managed service database. Users authorization may also flow via OAuth and therefore API services for user and video management may help. A typical User Interface, API services and Data stores will help with making the service available on a variety of devices.
The Videos are large in number. They will be organized in pool and repository. The pool videos will be the most watched or trending and will be available from a web server that implements an LRU like policy. Pool can also be maintained per user and aged. Users watched videos can also be eagerly loaded into user's pool on users page load. Dynamic queries will be honored with immediate fetches from disk while statistics are updated to enable them to be served again from cache. The storage of videos presents significant challenges. There can be an alternative to letting them be managed by a managed OneFS NAS Storage. In this case, we place them in data center storage separated by some partioning mechanism of videos. Streaming from the data center storage will continue to happen with one or more web servers dedicated for such purpose. Streaming services can maintain their own cache and performance improvements.
CDN will be used wherever possible to improve geographically close network. It will serve static resources and alleviate the load on the web servers. Caching will be offered at every level and as much as possible so that the backend is not hit for all purposes.
Video replication and availability from multiple regions may be offered to improve their availability and reliability. Indexes may be maintained for all video metadata including authoring and uploading by owners. Index need not be in relational data. Such information or mapping along with statistics can also be in key-value stores and executed with scatter-gather operations.
Grouping and collaborative filtering may be provided to provide recommendations to the user. Search term based top few videos may also be maintained for the most common search terms which also determine the candidacy of videos available in the pool. Social engineering features to videos such as likes and comments can also be enabled with their own microservice and non-relational databases.
#codingexercise
void GeneratePrimesUpto(int N)
{
var multiples = new List<int>(Enumerable.Repeat(0,N+1));
multiples[0] = 1;
multiples[1] = 1;
for (int i =2; i < N; i++)
for (int k = 2; k<=N && i*k<=N; k++)
if (multiples[i*k] == 0)
multiples[i*k] = 1;
for (int k =1; k<=N; k++)
if (multiples[k] == 0)
Console.WriteLine("{0}", k);
}

We were discussing another problem yesterday:
Find the minimum length for a repeated brackets of number N
if x is the number of opening brackets and y is the number of closing brackets in the minimum length repeated sequence
then x*y =N or x*y+() = N
and we can increment x by 1 or y by 1
http://ideone.com/s0dxCX
This strategy does not take into consideration minimization based on consecutive x and y multiples with x == y
which means
static int GetCountADP(int N, int m, int n)
{
if (N <= 0) return 0;
if (N == 0) return 0;
if (N == 1) return 2;
if (m * n == N) return m + n;
if (m * n > N) return N + N + N*2;
int count1 = GetCountADP(N - n, m + 1, n) + 1;
int count2 = GetCountADP(N - m, m, n + 1) + 1;
int count3 = GetCountADP(N - 1, m, n) + 2;
int count4 = GetCountADP(N - m*n, m+1, n+1) + m + n;
var ret = new List<int>() { count1, count2, count3, count4 };
int min = ret.Min();
return min;

}

Wednesday, June 21, 2017

We were reviewing DynamoDB and Cosmos DB from our discussion on the design of online store. Today we look at another system design question. How to design YouTube ? There are quite a few blog posts that answer this question. This one is not expected to be very different, however we will benefit from our discussion of cloud databases in earlier posts.
YouTube serves videos over the web. They grew incredibly fast with over hundred million video views per day. Their platform consists of Apache web server, Python web application on Linux stack, MySQL database server, pysco - a python to C compiler and lighttpd for video. They run Apache with mod_fast_cgi. The python web application mashes up all the content from different data sources. This application is usually not the bottleneck. Its the RPCs to get data sources that take time. They use the pysco compiler for high cpu intensive activities and a lot of caching. Fully formed Python objects are cached. Row level data in databases is cached. Static content is cached. They use NetScalar for load balancing and caching static content.
Data is constituted and sent to each application. The values are cached in local memory. An agent can watch for changes and send the data to each application.
Each video is hosted by a mini cluster The cluster comes in useful to scale and to serve video from more than one machines. There is also no downtime for backups. Servers use lighttpd web server for video becauses it uses epoll to wait on multiple fds. It can also scale with connections because there is little thrashing of content. CDNs are used to serve content wherever possible. Many of the videos are requested only a few times and a larger number of videos are requested. These translate to random disk reads which means caching may not always help. Therefore disks that are RAIDed are tuned. Initially the database was just mysql over a monolithic RAID 10. They leased hardware, went from a single master with multiple read slaves to partitioning the database and using a sharded approach. No matter how the data was organized, cache misses were frequent especially with slow replication. One of their solutions was to prioritize traffic by splitting the data into two clusters - a video watchpool and a general cluster. Subsequently they reduced cache misses and replication lag. Now they can scale the database arbitrarily. This is probably a good case study candidate for migrating data to public cloud because they were already storing data in mysql and most of the replication and caching was to distribute the load. They use data centers for file storage and CDN They have a large collection of videos and images. Images suffer from latency. Videos are bandwidth dependent. They use bigtable to lookup images in different data centers.
#codingexercise
Given a list of numbers, insert +,-* operators between them so that the result is divisible by a given number
http://ideone.com/7UlJn3
With these combinations we can now compute each of them to filter out the one we want.
http://ideone.com/ZeaznW

The solution for the operators can also include filters to select only those combinations that return a value divisible by the divisor.
Brackets are considered repeated when they have matching and closing paranthesis consecutively
For N=2, we have ()() or (()) and the length is 4
For N=3 we have ())) and the length is 4
Find the minimum length for a repeated brackets of number N
if x is the number of opening brackets and y is the number of closing brackets in the minimum length repeated sequence
then x*y =N or x*y+() = N
and we can increment x by 1 or y by 1

Tuesday, June 20, 2017

We were reviewing DynamoDB. We also reviewed Cosmos DB. Azure Cosmos DB was written to be the industry's first globally-distributed, multi-model fully managed database services. The latter is built to collect data from a plethora of IoT devices as well as to be handle the processing of machine learning. Today we compare the Dynamo DB and Cosmos DB.
First, Cosmos DB shows global distribution. As more and more regions are added, the same entity can be spread out over any number of regions. Dynamo DB does not offer region by region addition for the same entity. Each entity is local to a region.
Second, The criteria above also implies that storage and throughput can now be determined by more than one regions where as Dynamo DB cannot make that claim. Although both first and second are not the functional database features an application might use, it improves the availability and performance of the database to the application.
Third, spanning of multiple regions also provide multi-homing capabilities that are otherwise missed from dynamo db database.
Fourth, there is no compromise on read/write latency which is guaranteed to be lowest in Cosmos and at the 99th percentile. The same kind of metric is atleast not mentioned for Dynamo DB.
Fifth, Cosmos offers five consistency models where Dynamo offers only two of those namely strong and eventual. The availability of more consistency models allows an application to pick the behaviour of the database that best suits its persistence requirements.
Sixth, Cosmos offers support for multiple models and API. Dynamo does too. This is the flexibility that the applications want especially since they are fast becoming standards.
Seventh, Cosmos does not require maintenance. It does not maintain any schemas. Dynamo however requires indexes to be managed. This also leads to the feature of Cosmos where indexes are automatically maintained for all data.
Eighth, the service level agreements for the Cosmos are far more comprehensive than Dynamo. Although, the SLAs for NoSQL databases do not seem to have the same rigor as transactional systems, the performance and availability of the database make it more appealing.
#codingexercise
Convert multiset to distinct combinations where order does not matter:
Solution: http://ideone.com/ARkeZe
Some more http://ideone.com/lKxg5s
Also, it is possible to use a tree structure for the solution involving nested dictionaries shared yesterday.

Monday, June 19, 2017

Fulltext Search versus NLP in databases
Many databases in the 1990s started providing full text search techniques. It was a text retrieval technique that relied on searching all of the words in every document stored in a specific full text database.Database engines and web search engines started providing this capability so that search did not just cover the metadata and the selected sections but also the entire content. Since it was difficult to scan the entire content of a file one by one, it was offloaded to its own process separate from the OLTP to create a concordance. Stop words and stemming helped to keep the index relevant. Querying also improved significantly with fulltext and new forms of queries were made available. Still FullText did not really make use of clustering or page ranking techniques.
The difference in techniques between fulltext and nlp rely largely on the data structures used. While one uses inverted document lists and indexes, the other is making use of word vectors. The latter can be used to create similarity graphs and used with inference techniques. The word vectors can also be used with a variety of data mining techniques. With the availability of managed service databases in the public clouds, we are now empowered to creating nlp databases in the cloud.
Discussion to be continued ...
A text summarization attempt: http://scratchpad.io/scintillating-spot-7410
API
UI
#codingexercise
Given a rectangular matrix with rowwise and columnwise sorted distinct integers, find a given integer
Tuple<int,int> GetPosition(int[,] A, int Rows, int Cols, int X)
{
var ret = new Tuple<int, int>(){-1,-1};
// x, y coordinates
for (int j =0; j < Cols; j++)
{
int index = binary_search(A, rows, j, X);
if ( index != -1)
{
ret.first = index;
ret.second = j;
break;
}
}
return ret;
}
Sorting nested dictionary : http://ideone.com/SlHBib

Sunday, June 18, 2017

We were reviewing DynamoDB. We will compare it with Cosmos DB but first let us review Cosmos DB. Azure Cosmos DB was written to be the industry's first globally-distributed, multi-model database services. It is built to collect data from a plethora of IoT devices as well as to be handle the processing of machine-learning. It is a globally distributed data service that can elastically scale throughput and storage with data consistency, low latency and high availability. There are five different consistency levels which include strong, bounded-staleness, session, consistent-prefix and eventual consistency levels. No schema or index is required because the database automatically indexes all data and the engine does not know or care about any schemas. Key-value, documents and graph data models are easily supported.
Perhaps one of the undeniably desirable feature of Cosmos DB is for developers to choose the data model and the API. Developers are no longer bound to one model. They can easily switch from key-value to graph model as they write one application after another. This makes this DB second nature to the developers and earns mindshare which is just as important as acquiring data because the two are related. Similarly they can also manipulate the data with the same set of APIs that they are already familiar with. This database does not impose any new APIs We can use the same dialects such as that of Document DB, Mongo DB and APIs of Gremlin database. These are in addition to the native Azure Table Storage API. Not only this but other data access APIs can be added to the offerings going forward. Azure Cosmos is the first and only globally distributed database service in the industry to financially backed comprehensive SLAs as per their documentation. This gives us the assurance that the APIs will not only be easier to work with but also highly available with low latency, high consistency and throughput. In other words, customers who were using NoSQL database can now for go their own maintenance and adopt a managed Database as a Service layer with little or no disruption to their existing applications. In our discussion of Dataset architecture here, we saw important concepts around new trends but nothing seems to bring it all in together as this managed service platform from Azure Cosmos.

A text summarization attempt: http://scratchpad.io/scintillating-spot-7410
API
UI

#codingexercise
Given a rectangular matrix with rowwise and columnwise sorted distinct integers, find a given integer
Tuple<int,int> GetPosition(int[,] A, int Rows, int Cols, int X)
{
var ret = new Tuple<int, int>(){-1,-1};
// x, y coordinates
for (int i =0; i < Rows; i++)
{
int index = binary_search(A, cols, i, X);
if ( index != -1)
{
ret.first = i;
ret.second = index;
break;
}
}
return ret;
}
We can also do binary search along columns as well as rows.

Saturday, June 17, 2017

This database allows the developers to add any number of regions while retaining the same fidelity as the one closest. The database also seamlessly replicates data across regions. It elastically scales throughput and storage. The applications can also be built to be more responsive. The database engine is said to be write-optimized, log structured and latch-free which enables it to guarantee writes in single digit millisecond latencies anywhere in the world.

#codingexercise

Node BuildTree(char[] Preorder, char[] InOrder, int index = 0)
{
Node root = new Node();
root.data = PreOrder[index];
root.left = null;
root.right = null;

int inIndex = Array,indexOf(InOrder, Preorder[index]);

if ( index+1 < Preorder.Length &&
IsLeftSubtree(Preorder[index+1], inOrder, inIndex) == true)
root.left = BuildTree(Preorder, InOrder, index + 1);

if ( inIndex+1 < InOrder.Length &&
isPredecessor(InOrder[inIndex+1], Preorder, index) == false)
root.right = BuildTree(Preorder, InOrder, Array.IndexOf(PreOrder, InOrder[inIndex + 1]));

return root;

}

bool is Predecessor(char c, char[] Preorder, int index)
{
return Array.IndexOf(Preorder, c) < index;
}

bool isLeftSubtree(char c, char[] inOrder, int index)
{
Array.IndexOf(Inorder,c) < index;

}

// Preorder A B C D E F
// Inorder C B D A E F
// Build tree
a
b e
c d f
Node BuildTree(char[] Preorder, char[] InOrder, char parent, int index = 0, int start, int end)
{
Node root = new Node();
char current = PreOrder[index];
root.data = current;
root.left = null;
root.right = null;
if (start == end) return root;
int size_left = indexOfFromStart(InOrder, current, parent, index, start, end);
if (size_left > 0)
root.left = BuildTree(Preorder, InOrder, current, index+1, index+1, index+size_left);
int size_right = end-start-size_left;
if (size_right > 0)
root.right = BuildTree(Preorder, InOrder, current, index+size_left+1, index+size_left+1, end);
return root;
}
int indexOfFromStart(char[] InOrder, char c, char parent, int index, int start, int end)
{
if (start >= end) return 0;
int inIndex= InOrder.IndexOf(c);
// start and end are in pre-order
// index in from inorder but the size of the left subtree
// must be contained in preorder between start and end
// and must be between 0 and index in inorder
if (inIndex<start) return 0;
assert(inindex <= end);
int begin = 0;
if (parent != '/0')
begin = InOrder.IndexOf(parent);
if (begin<inIndex)
return inIndex-begin-1;
return inIndex;
}