Cluster computing

Tuesday, January 9, 2018

Graded noise reduction

Introduction: Devices and applications demand our attention with notifications. Have you ever felt overwhelmed by the notifications from every application including games on your smartphone? Wouldn't it be nice if there was an automated way to turn off notifications depending on your interest level? Sure, we could turn on or off notifications from individual apps but how often do we toggle that. Besides how do we know which applications' notifications do we turn on or off at this time? Instead if there was a single slider that would do this for us the right way each time, wouldn't it be more convenient? This is the purpose of this writeup.

Description: Consider the number of on-off switches to be one each for every application. Since there can be many applications, this list may run into several pages. Some of these applications may be merely games which are notorious for their notifications. Outlook and Calendar, on the other hand, would only serve reminders if they are configured. All these applications are decided jointly. If we could string through the apps with a single switch, then we could turn off the notifications all at once. Since the applications are decided dynamically, it could choose to include the noisy ones depending on our tolerance levels. Therefore, we have a sliding scale of tolerance and as we move the slider over it, apps will start going silent starting from a few and ending with all. And the greatest benefit is the enhanced control for the region in between. Its takes away the effort involved in figuring out which applications to silence at any given time and learns this dynamically based on users' initial selections and the rate at which notifications are sent by the applications. This articulation of a single control gives us graded silence so we can adjust it instead of resorting to mute our phone all the time which still does not address the cluttered notifications on the screen and the requirement to browse them before dismissing one or all and that too repeatedly. Internally the sliding scale uses a simple weighted average of the number of notifications from the applications in a full day and the severity of the application. Reminders will be considered a high severity application while games and ads will be considered less severe. This easy setting of notifications will also reflect the toggled state before and after the switch is flipped. The applications are then ranked by priority and the threshold set by the user with the help of the slider is used to determine which applications are affected. These applications are then individually silenced. The algorithm to determine this can also use grouping and ranking techniques. Additionally, the means of notification may also be made consistent across all applications.

Testing: If there is a metric for the number and relevance of the notifications corresponding to the bar set by the slider such as the F1 score, then it is appropriate to grade the sliding scale and check if the notifications permitted match the setting for any and all assortment of applications.

Conclusion: A single control across all applications that more useful than the mute button on the smartphone has some appeal to improve productivity for the owner.

#codingexercise
We were discussing finding the largest sum submatrix in a matrix of integers.
As we evaluated the growing range from column to column, we iterated inwards to outwards but the reverse is more efficient.

The advantage here is that the outer bounds of the entire matrix has the largest possible array to apply Kadane's algorithm which we discussed earlier as the one used for finding the max subarray.

Monday, January 8, 2018

#codingexercise
We find the maximum number of deletions to make a string palindrome
We can use modified Levenshtein distance to find this number

int GetCount(String A)
{
String R = A.reverse();
int n = A.Length;
int[,] dp = new int [n+1,n+1];
for (int i = 0; i < =n; i++)
{
dp[0,i] = i;
dp[i,0] = i;
}
for (int i = 1; i <= n; i++)
for (int j = 1; j <= n; j++)
{
if (A[i-1] == R[j-1])
dp[i,j] = dp[i-1,j-1];
else
d[[i,j] = 1 + Math.Min(dp[i-1,j], dp[i,j-1]);
}
int res = INT_MAX;
int i = n;
int j = 0;
while (i >= 0)
{
res = Math.Min(res, dp[i,j]);
if ( i < n)
res = Math.Min(res, dp[i+1,j]);
if (i > 0)
res = Math.Min(res, dp[i-1,j]);
i--;
j++;
}
return res;
}

"book"
"koob"
dp
0 1 2 3 4
1 2 3 4 3
2 3 2 3 4
3 4 3 2 3
4 3 4 5 6
Minimum value is 2
Courtesy: geeksforgeeks

Sunday, January 7, 2018

#codingexercise
Yesterday we were discussing finding largest sum sub-matrix. It required largest sum subarray to be efficient. This we have seen earlier as Kadane's algorithm. Today we require to compute k such maximum sum subarray that are non-overlapping. The approach in this case is
repeat for k times :
First find a subarray with maximum sum using techniques discussed earlier.
If such an array exists:
print the bounds of the subarray along with its sum
replace this subarray with values lower than all elements so that it does not participate in the next find.
Note that the existence of the array might fail because there might not be k such subarray. Each iteration is greedy and inclusive of as many contiguous elements as possible. If the number of such arrays fall short of k, we can always slice the arrays to meet or exceed k based on the descending order of subarrays ordered by their sum.
We discuss the ways to find the max subarray sum:
Largest subarray sum - Kadane’s algorithm

Initialize

max_so_far = 0

max_ending_here = 0

For each element of the array

                max_ending_here = max(0, max_ending_here+ A[i])

                max_so_far = max(max_so_far, max_ending_here)

Alternatively

Initialize and maintain the following variables

x = sum of elements

y = minimum of sum encountered

z = maximum of sum encountered

For each element of the array

           y = min(x, y)

           z = max(z, x-y)
return s

For example:

-1, 0, 2, 1 array
has following x, y, and z values in each iteration:
x y z
-1 -1 0
-1 -1 0
1 -1 2
2 -1 3
max sub array sum = 3

Saturday, January 6, 2018

#codingexercise
Find the largest rectangular sub-matrix having sum divisible by k.
The naive approach is to exhaust all the rectangular sub matrices to find the sums. check that the sums are divisible by k and then update the max of the rectangular size found so far.
This appears something like
int GetLargest(int[,] M, int R, int C, int k)
{
int max = 0;
// Every element forms the top left corner of a rectangular area of different sizes
for (int i = 0; i < R; i++)
for (int j = 0; j < C; j++)
{
int kx = i;
int ky = j;

for (int m = kx; m < R; m++)
for (int n = ky; n < C; n++)
{
int sum = GetSum(M, i,j,m,n);
if (sum % k == 0)
{
int size = GetSize(i,j,m,n);
if (size > max)
max = size;
}
}
return max;
}
The above is O(N^3) complexity as the 2D iterations are performed for every coordinate in the rectangular area
As we can see that this has a lot of overlapping regions that are summed again and again.
Instead, we could enumerate all the possible rectangular areas to be bounded in the matrix between a pair of left and right columns and a pair of top and bottom rows with the temporary sums along each row and column The temporary sums are in a linear array and we can now use the algorithm to find the longest sub-array having sum divisible by k to find the rectangular sub area in this 2D scan.

Friday, January 5, 2018

Let us take a closer look at the event application framework we mentioned yesterday. Several designs compete for this space. We enumerate some below:

First – we mention the design of MSMQ with its dead letter and poison letter queues. This design includes support for retries by the queue processor, non-invasive approach which lets the clients send and forget, ability to create an audit log, support for transactional as well as non-transactional messages, support for distributed transactions, support for clusters instead of standalone single machine deployments and ability to improve concurrency control.

This design is somewhat different from that of System.Messaging. The system.messaging library transparently exposes the underlying Message queuing windows APIs . For example, it provides GetPublicQueues method that enumerates the public message queues. It takes the message queue criteria as a parameter. This criterion can be specified with parameters such as category and label. It can also take machine name or cluster name, created and modified times as filter parameters. The GetPublicQueuesEnumerator is available to provide an enumerator to iterate over the results.

Second – we mention the design of ZeroMQ, RabbitMQ and others of this family with their flexibility for several publishers and consumers topology. These are superfast guaranteed delivery message bus but they are mostly for standalone deployments and not as cloud services.

In this regard, Amazon Simple Queue Services offers almost unparalleled fully managed message queuing service that makes it easy to integrate with microservices and serverless applications. SQS also makes it simple and cost-effective to intermediate between these producers and consumers.

Third – we mention the highly performant Erlang-based messaging framework. Whatsapp is a messaging architecture. It offers free texting in a SMS world. There are no ads, no gimmicks, no games and no feeds. They have hundreds of nodes, thousands of cores, hundreds of terabytes of RAM, serve billions of smartphones. Their server infrastructure is based on Erlang/FreeBSD. The Erlang is a programming language that is used to build massively scalable soft real-time systems with requirements on high availability. With this programming language they send out about 70 million messages per second.

This architecture is famous for “it just works” because only a phone number is used as the address to send messages to and messages can be voice, video, images or text. It also features voice message, group chats and read-receipts.

The backend consists of Erlang, Jabber, FreeBSD, Yaws web server, PHP, BEAM, and custom XMPP. The Frontend consists of seven client platforms for the different mobile devices and SQLite. Every user is an actor. Messages flow between actors when they are online or they are stored offline. When the actor comes online, he pulls the messages sent to him. User and actor differ in that the actor notion or proxy exists only in the WhatsApp backend framework. Message sent from user goes to actor inbox which then sends the message to the other actor who then repeatedly sends to the user behind the actor. When that user comes online, the messages appear. The Mnesia backend DB stores the actors and their inboxes. The message is encrypted with a key as a tag. They use tools for system activity monitoring on all their server operating systems as well as BEAM Processor hardware perf counters and dtrace. BEAM is the erlang VM that is used to process the messages.

The highlight of this architecture is its performance-oriented design. Even similar feature from Facebook also written in erlang could not match the widespread adoption of this messenger. They have proven that Erlang and FreeBSD have unmatched SMP scalability.

With these three different forms of design, we can now utilize cloud services or build a limited scale light-weight customized solution for the requirements we mentioned.

#codingexercise
Yesterday we were discussing ways to climb up the staircase with jumps of 1, 2 or 3 steps.
We can generalize this for climbing up staircase with jumps of 1 to N-1 steps
The number of ways to climb up the staircase with N steps by taking a leap of 1 to N steps at a time is simply the sum of binomial coefficients. We see this by cumulating that there are n-1Choose1 ways to take one stop jump and n-1Choose2 ways to take two jumps and so on.
static int GetCount(int n)
{
if (n == 0)
return 1;
return (1 << (n - 1));
}

The binomial coefficient for increasing degrees form the Pascals triangle.

Thursday, January 4, 2018

Event Application Framework for debugging.
Most of us have relied on the debugger for examining the execution of code. Undeniably, debugging is the authoritative way to figure out what the code is doing when unexpected execution happens especially if there are no exceptions or logs. We discussed a few of the debugging rules of thumb here (http://ravinote.blogspot.com/2013/05/today-i-had-to-do-some-debugging-and-it.html)
However, today I add event application framework support to the troubleshooting toolbox in addition to the debugger. One such example is the circuits library (https://pypi.python.org/pypi/circuits)
So we can do things like everything on the wire:
#!/usr/bin/env python
from circuits import handler, Debugger
from circuits.net.sockets import TCPServer
class EchoServer(TCPServer):

@handler("read")
def on_read(self, sock, data):
return data
app = EchoServer(9000)
Debugger().register(app)
app.run()
While this kind of packet capture may not be very helpful as compared to the http archive file for the web applications since most of the handlers are already responding to http requests, it goes to show how the event application framework can be used for internal events. Moreover we can expose these handlers as web methods as well such as when we want to list some internal statistics or in-memory object audit history.
#!/usr/bin/env python
from circuits.web import Server, Controller
class Root(Controller):
def index(self):
return "Hello World!"
app = Server(("0.0.0.0", 9000))
Root().register(app)
app.run()

#codingexercise
We were discussing counting the number of ways to climb up the staircase and we can modify the number of steps at any time to 1 or 2
int GetCountWays(uint n)
{
int [] dp = new int[n+2];
dp [0] = 0;
dp [1] = 1;
dp [2] = 2;

for (int k = 3; k <= n; k++) {

dp [k] = dp [k-1] + dp [k-2];

}
return dp [n];
}

and recursively as :
int GetCount(uint n)
{
if ( n == 0) return 0;
if (n == 1) return 1;
if (n == 2) return 2;
return GetCount(n-1)+GetCount(n-2);
}

Some people like to start the dp[0] as 1 so that this follows the Fibonacci pattern for 1 and 2 steps

Note we can generalize this for climbing up staircase from 1 to N-1 steps

Wednesday, January 3, 2018

We were discussing Monte Carlo simulations. Historically simulations were used to test a previously understood deterministic problem. The sampling was used to generate uncertainties in the simulations. Monte Carlo simulations inverted this approach. It tries to solve deterministic problems using optimizations based on probabilistic interpretations. It draws samples from a probability distribution. We had seen another such method earlier. It was called Simulated Annealing. This is a special case of Monte Carlo. In Simulated annealing, we computed the current cost and the new cost based on the direction of change. If it improves, we decrease the temperature.

#codingexercise
We were discussing the bottom up manner of counting ways to climb the staircase
int GetCountWays(uint n)
{
int [] dp = new int[n+2];
dp [0] = 0;
dp [1] = 1;
dp [2] = 2;
dp [3] = 4;

for (int k = 4; k <= n; k++) {

dp [k] = dp [k-1] + dp [k-2] + dp [k-3];

}
return dp [n];
}
Some people like to start the dp[0] as 1 so that this follows the Fibonacci pattern for 1 and 2 steps

The above memoization can be reused for computing results for consecutive integers