Cluster computing

Thursday, January 11, 2018

Today we continue discussing the AWS papers on software architecture which suggests five pillars:
- Operational Excellence for running and monitoring business critical systems.
- Security to protect information, systems, and assets with risk assessments and mitigation strategies.
- Reliability to recover from infrastructure or service disruptions
- Performance Efficiency to ensure efficiency in the usage of resources
- Cost Optimization to help eliminate unneeded cost and keeps the system trimmed and lean.
The guidelines to achieve the above pillars include:
1. Infrastructure capacity should be estimated not guessed
2. Systems should be tested on production scale to eliminate surprises
3. Architectural experimentation should be made easier with automation
4. There should be flexibility to evolve architectures
5. Changes to the architecture should be driven by data
6. Plan for peak days and test at these loads to observe areas of improvement
It should be noted that Amazon's take on architecture is that there is no need for centralized decisions as described by TOGAF or the Zachman Framework. TOGAF is a framework that pools in requirements from technology vision, business demands, deployment and operations, implementations, migrations and upgrades to form central recommendations The Zachman framework provides a table with columns such as why, how, what, who, when, and rows as layers of organization so that nothing is left out of the considerations from architecture point of view. There are definitely risks associated with the distributed recommendations where internal teams may not adhere to strict standards. This is mitigated in two ways - there are mechanisms that carry out automated checks to ensure standards are being met and second - a culture that work backwards from the customer focused innovation.
#codingexercise
We were discussing finding the largest sum submatrix in a matrix of integers.
As we evaluated the growing range from column to column, we iterated inwards to outwards but the reverse is more efficient.

The advantage here is that the outer bounds of the entire matrix has the largest possible array to apply Kadane's algorithm which we discussed earlier as the one used for finding the max subarray.

Since we find the maximum subarray (i1,i2) along the rows and (j1,j2) along the columns at the first and last row and column of the matrix, the submatrix with the largest sum will lie within (i1,j1) and (i2,j2). We merely refine it with every iteration that shrinks the matrix.

Note that we can use the min common (i-topleft,j-topleft) and (i-bottom-right,j-bottom-right) from the first and last row and column subarray determinations.

Wednesday, January 10, 2018

We start reading AWS whitepapers. We begin with the AWS well-architected framework.
This is based on five pillars:
Operational Excellence which is the ability to run and monitor systems which are business cirtical and delivering functionality to customers. This pillar also includes all the processes and procedures that come with software support.
Security which is not only a non-functional requirement for a cloud hosted software but also a mandate from users and governance as well. It includes the ability to protect information, systems, and assets with risk assessments and mitigation strategies.
Reliability which is the pillar that safeguards against failures both sporadic and frequent. This is the ability of a system to recover from infrastructure or service disruptions, handling failures by acquiring and seamlessly moving to computing resources to meet demand, and mitigating disruptions that come via say misconfigurations or transient network issues.
Performance Efficiency which is the pillar to make sure there is efficiency in the usage of computing resources to meet system requirements, and to maintain it in the face of business and technology changes.
Cost Optimization which is the pillar that helps eliminate unneeded cost and keeps the system trimmed and lean.
This paper enforces the guidelines for a sound architecture to be based around the following:
1. Infrastructure capacity should be estimated not guessed
2. Systems should be tested on production scale to eliminate surprises
3. Architectural experimentation should be made easier with automation
4. There should be flexibility to evolve architectures
5. Changes to the architecture should be driven by data
6. Plan for peak days and test at these loads to observe areas of improvement

#codingexercise
We were discussing finding the largest sum submatrix in a matrix of integers.
As we evaluated the growing range from column to column, we iterated inwards to outwards but the reverse is more efficient.

The advantage here is that the outer bounds of the entire matrix has the largest possible array to apply Kadane's algorithm which we discussed earlier as the one used for finding the max subarray.

Since we find the maximum subarray (i1,i2) along the rows and (j1,j2) along the columns at the first and last row and column of the matrix, the submatrix with the largest sum will lie within (i1,j1) and (i2,j2). We merely refine it with every iteration that shrinks the matrix.

Tuesday, January 9, 2018

Graded noise reduction

Introduction: Devices and applications demand our attention with notifications. Have you ever felt overwhelmed by the notifications from every application including games on your smartphone? Wouldn't it be nice if there was an automated way to turn off notifications depending on your interest level? Sure, we could turn on or off notifications from individual apps but how often do we toggle that. Besides how do we know which applications' notifications do we turn on or off at this time? Instead if there was a single slider that would do this for us the right way each time, wouldn't it be more convenient? This is the purpose of this writeup.

Description: Consider the number of on-off switches to be one each for every application. Since there can be many applications, this list may run into several pages. Some of these applications may be merely games which are notorious for their notifications. Outlook and Calendar, on the other hand, would only serve reminders if they are configured. All these applications are decided jointly. If we could string through the apps with a single switch, then we could turn off the notifications all at once. Since the applications are decided dynamically, it could choose to include the noisy ones depending on our tolerance levels. Therefore, we have a sliding scale of tolerance and as we move the slider over it, apps will start going silent starting from a few and ending with all. And the greatest benefit is the enhanced control for the region in between. Its takes away the effort involved in figuring out which applications to silence at any given time and learns this dynamically based on users' initial selections and the rate at which notifications are sent by the applications. This articulation of a single control gives us graded silence so we can adjust it instead of resorting to mute our phone all the time which still does not address the cluttered notifications on the screen and the requirement to browse them before dismissing one or all and that too repeatedly. Internally the sliding scale uses a simple weighted average of the number of notifications from the applications in a full day and the severity of the application. Reminders will be considered a high severity application while games and ads will be considered less severe. This easy setting of notifications will also reflect the toggled state before and after the switch is flipped. The applications are then ranked by priority and the threshold set by the user with the help of the slider is used to determine which applications are affected. These applications are then individually silenced. The algorithm to determine this can also use grouping and ranking techniques. Additionally, the means of notification may also be made consistent across all applications.

Testing: If there is a metric for the number and relevance of the notifications corresponding to the bar set by the slider such as the F1 score, then it is appropriate to grade the sliding scale and check if the notifications permitted match the setting for any and all assortment of applications.

Conclusion: A single control across all applications that more useful than the mute button on the smartphone has some appeal to improve productivity for the owner.

#codingexercise
We were discussing finding the largest sum submatrix in a matrix of integers.
As we evaluated the growing range from column to column, we iterated inwards to outwards but the reverse is more efficient.

The advantage here is that the outer bounds of the entire matrix has the largest possible array to apply Kadane's algorithm which we discussed earlier as the one used for finding the max subarray.

Monday, January 8, 2018

#codingexercise
We find the maximum number of deletions to make a string palindrome
We can use modified Levenshtein distance to find this number

int GetCount(String A)
{
String R = A.reverse();
int n = A.Length;
int[,] dp = new int [n+1,n+1];
for (int i = 0; i < =n; i++)
{
dp[0,i] = i;
dp[i,0] = i;
}
for (int i = 1; i <= n; i++)
for (int j = 1; j <= n; j++)
{
if (A[i-1] == R[j-1])
dp[i,j] = dp[i-1,j-1];
else
d[[i,j] = 1 + Math.Min(dp[i-1,j], dp[i,j-1]);
}
int res = INT_MAX;
int i = n;
int j = 0;
while (i >= 0)
{
res = Math.Min(res, dp[i,j]);
if ( i < n)
res = Math.Min(res, dp[i+1,j]);
if (i > 0)
res = Math.Min(res, dp[i-1,j]);
i--;
j++;
}
return res;
}

"book"
"koob"
dp
0 1 2 3 4
1 2 3 4 3
2 3 2 3 4
3 4 3 2 3
4 3 4 5 6
Minimum value is 2
Courtesy: geeksforgeeks

Sunday, January 7, 2018

#codingexercise
Yesterday we were discussing finding largest sum sub-matrix. It required largest sum subarray to be efficient. This we have seen earlier as Kadane's algorithm. Today we require to compute k such maximum sum subarray that are non-overlapping. The approach in this case is
repeat for k times :
First find a subarray with maximum sum using techniques discussed earlier.
If such an array exists:
print the bounds of the subarray along with its sum
replace this subarray with values lower than all elements so that it does not participate in the next find.
Note that the existence of the array might fail because there might not be k such subarray. Each iteration is greedy and inclusive of as many contiguous elements as possible. If the number of such arrays fall short of k, we can always slice the arrays to meet or exceed k based on the descending order of subarrays ordered by their sum.
We discuss the ways to find the max subarray sum:
Largest subarray sum - Kadane’s algorithm

Initialize

max_so_far = 0

max_ending_here = 0

For each element of the array

                max_ending_here = max(0, max_ending_here+ A[i])

                max_so_far = max(max_so_far, max_ending_here)

Alternatively

Initialize and maintain the following variables

x = sum of elements

y = minimum of sum encountered

z = maximum of sum encountered

For each element of the array

           y = min(x, y)

           z = max(z, x-y)
return s

For example:

-1, 0, 2, 1 array
has following x, y, and z values in each iteration:
x y z
-1 -1 0
-1 -1 0
1 -1 2
2 -1 3
max sub array sum = 3

Saturday, January 6, 2018

#codingexercise
Find the largest rectangular sub-matrix having sum divisible by k.
The naive approach is to exhaust all the rectangular sub matrices to find the sums. check that the sums are divisible by k and then update the max of the rectangular size found so far.
This appears something like
int GetLargest(int[,] M, int R, int C, int k)
{
int max = 0;
// Every element forms the top left corner of a rectangular area of different sizes
for (int i = 0; i < R; i++)
for (int j = 0; j < C; j++)
{
int kx = i;
int ky = j;

for (int m = kx; m < R; m++)
for (int n = ky; n < C; n++)
{
int sum = GetSum(M, i,j,m,n);
if (sum % k == 0)
{
int size = GetSize(i,j,m,n);
if (size > max)
max = size;
}
}
return max;
}
The above is O(N^3) complexity as the 2D iterations are performed for every coordinate in the rectangular area
As we can see that this has a lot of overlapping regions that are summed again and again.
Instead, we could enumerate all the possible rectangular areas to be bounded in the matrix between a pair of left and right columns and a pair of top and bottom rows with the temporary sums along each row and column The temporary sums are in a linear array and we can now use the algorithm to find the longest sub-array having sum divisible by k to find the rectangular sub area in this 2D scan.

Friday, January 5, 2018

Let us take a closer look at the event application framework we mentioned yesterday. Several designs compete for this space. We enumerate some below:

First – we mention the design of MSMQ with its dead letter and poison letter queues. This design includes support for retries by the queue processor, non-invasive approach which lets the clients send and forget, ability to create an audit log, support for transactional as well as non-transactional messages, support for distributed transactions, support for clusters instead of standalone single machine deployments and ability to improve concurrency control.

This design is somewhat different from that of System.Messaging. The system.messaging library transparently exposes the underlying Message queuing windows APIs . For example, it provides GetPublicQueues method that enumerates the public message queues. It takes the message queue criteria as a parameter. This criterion can be specified with parameters such as category and label. It can also take machine name or cluster name, created and modified times as filter parameters. The GetPublicQueuesEnumerator is available to provide an enumerator to iterate over the results.

Second – we mention the design of ZeroMQ, RabbitMQ and others of this family with their flexibility for several publishers and consumers topology. These are superfast guaranteed delivery message bus but they are mostly for standalone deployments and not as cloud services.

In this regard, Amazon Simple Queue Services offers almost unparalleled fully managed message queuing service that makes it easy to integrate with microservices and serverless applications. SQS also makes it simple and cost-effective to intermediate between these producers and consumers.

Third – we mention the highly performant Erlang-based messaging framework. Whatsapp is a messaging architecture. It offers free texting in a SMS world. There are no ads, no gimmicks, no games and no feeds. They have hundreds of nodes, thousands of cores, hundreds of terabytes of RAM, serve billions of smartphones. Their server infrastructure is based on Erlang/FreeBSD. The Erlang is a programming language that is used to build massively scalable soft real-time systems with requirements on high availability. With this programming language they send out about 70 million messages per second.

This architecture is famous for “it just works” because only a phone number is used as the address to send messages to and messages can be voice, video, images or text. It also features voice message, group chats and read-receipts.

The backend consists of Erlang, Jabber, FreeBSD, Yaws web server, PHP, BEAM, and custom XMPP. The Frontend consists of seven client platforms for the different mobile devices and SQLite. Every user is an actor. Messages flow between actors when they are online or they are stored offline. When the actor comes online, he pulls the messages sent to him. User and actor differ in that the actor notion or proxy exists only in the WhatsApp backend framework. Message sent from user goes to actor inbox which then sends the message to the other actor who then repeatedly sends to the user behind the actor. When that user comes online, the messages appear. The Mnesia backend DB stores the actors and their inboxes. The message is encrypted with a key as a tag. They use tools for system activity monitoring on all their server operating systems as well as BEAM Processor hardware perf counters and dtrace. BEAM is the erlang VM that is used to process the messages.

The highlight of this architecture is its performance-oriented design. Even similar feature from Facebook also written in erlang could not match the widespread adoption of this messenger. They have proven that Erlang and FreeBSD have unmatched SMP scalability.

With these three different forms of design, we can now utilize cloud services or build a limited scale light-weight customized solution for the requirements we mentioned.

#codingexercise
Yesterday we were discussing ways to climb up the staircase with jumps of 1, 2 or 3 steps.
We can generalize this for climbing up staircase with jumps of 1 to N-1 steps
The number of ways to climb up the staircase with N steps by taking a leap of 1 to N steps at a time is simply the sum of binomial coefficients. We see this by cumulating that there are n-1Choose1 ways to take one stop jump and n-1Choose2 ways to take two jumps and so on.
static int GetCount(int n)
{
if (n == 0)
return 1;
return (1 << (n - 1));
}

The binomial coefficient for increasing degrees form the Pascals triangle.