Cluster computing

Monday, June 29, 2015

Trie data structure:
A trie is also referred to as a prefix tree or radix tree and unlike B-Trees which store the keys in the node, a trie uses the position of the node to construct the prefix. All the descendants of a node have a common prefix associated with that node. The prefix starts out empty at the root and gets appended as we descend down the children. The term 'trie' comes from retrieval. A trie comes in very useful for auto-complete and spell-checker.
A trie stores words with the letters on the edges and as we walk down, we read one letter at a time. Words can end or be a prefix of others. To do a lookup, we just walk down the tree letter by letter and then see if the node has an ending $ to the node. If we get to a null pointer, we know that the word is not there. To efficiently save the prefixes, paths that don't branch into a single edge are compressed and only split when needed.
To lookup the words, we use radix sort.
void RadixSort(Array A, int digits)
{
for (int i = 1; i <= digits; i++)
      InsertionSort(A, i);
}

void InsertionSort(A, d) {
for (int i = 1; i <= n; i++)
{
var x = A[i];
int j = i -1;
while (j >= 0 && A[j] > x)
{
// like arranging a hand of playing cards.
A[j + 1] = A[j];
j = j - 1;
}
A[j + 1] = x;
}
}

We now look at inserting words into a trie. When we add a word to a vertex, we will add word to the corresponding branch of vertex cutting the leftmost character of the word. If the needed branch does not exist, we will have to create it.

class TrieNode:
         def addword(self, word):
                if len(word) == 0:
                    self.isWord = True
                    return
                 key = word[0]
                 word = word [1:]
                 if self.next.has_key[key]:
                     self.next[key].add_item(input)
                 else:
                      node = TrieNode()
                      self.next[key] = node
                      node.add_item(string)

         def dfs (self, sofar = None):
                # we print in two cases
                if self.next.keys() == []:
                    print 'Match:'+sofar
                       # this could be modified to look for similar words by backtracking to the parent and looking for siblings
                if self.isWord:
                    print 'Match:' + sofar
                for key in self.next.keys():
                    self.next[key].dfs(sofar)

        def autocomplete(self, word, sofar=""):
                if len(word) == 0:
                    if self.isWord:
                        print 'Match:', sofar
                    for key in self.next.keys():
                        self.next[key].dfs(sofar)
               else:
                    key = word[0]
                    word = word[1:]
                    if self.next.has_key(key):
                        sofar = sofar + key
                        self.next[key].autocomplete(word, sofar)



courtesy:sarathlakshman.com

WebAPIs often use various authentication methods such as basic authentication, token authentication, or even session authentication. However, applications using the webAPIs may also require sessions to associate the resources for the same user using the same user-agent* (defined in RFC 2616). For example, consider the Cart API. To get the items in the shopping cart of a user the application may have to use the GET method on the Cart API as follows:
GET Cart/ API
“X-Merchant-ID: client_key”
“X-Session: <user session identifier>”
The application can also add or edit items in the shopping cart or delete it. This example illustrates the need for the application to associate an identifier for the scope of a set of webAPI calls.
This session identifier is usually obtained as a hash of the session cookie. And is provided by the authentication and authorization server. The session identifier can be requested as part of the login process or by a separate API Call to a session endpoint. An active session removes the need to re-authenticate. It provides a familiar end-user experience and functionality. The session can also be used with user-agent features or extensions to assist with authentication such as password-manager or 2-factor device reader.
The session identifier is usually a hash of the session. It is dynamic in nature, for external use and useful only for identifying something that is temporary. It can be requested based on predetermined client_secrets.
Sessions can time out or be explicitly torn down – either by the user or by the system forcing re-authentication. Therefore session management must expire/clear the associated cookie and identifiers.
Sessions will need to be protected against csrf attack and clickjacking just as they are for other resources.
Sessions are treated the same as credentials in terms of their lifetime.The user for the session can be looked up as follows:
{HttpCookie cookie = HttpContext.Current.Request.Cookies[FormsAuthentication.FormsCookieName];
FormsAuthenticationTicket ticket = FormsAuthentication.Decrypt(cookie.Value);
SampleIdentity id = new SampleIdentity(ticket);
GenericPrincipal prin = new GenericPrinicipal(id, null);
HttpContext.Current.User = prin;}

Sunday, June 28, 2015

monitoring and alerts

Systems have to be monitored round the clock for availability and readiness. Sincethis is a tedious time consuming process, each component that is added to thesystem is responsible for its own health and to attract attention when somethinggoes wrong. While there may be many ways to call for attention, they systemadministrator can facilitate a sink for collecting such monitoring information. Forexample, a logging handler or an email handler can be added to the system. Ofcourse, components will have to judiciously use levels for the alarms they raise.However monitoring need not be intrusive or inlined by the components. Healthchecks such as whether an application is running or suspended or crashed can bedetermined by the platform it is running on. These non-invasive handlers work at a layer different from the one for the components.
Take dtrace for example and this gives great visibility to the platform it operates on, but it requires turning on instrumentation and targeting the malfunctioning item – both of which have their own overheads. Instrumentation overheads and impact involves:

Slowing down the system

Setting or targeting the malfunctioning item

Affect the timing of certain operations where problems don’t surface anymore

Require tremendous amount of setup especially when trying to nail the culprit.

And above all else, it works against the performance of the system.

If the components give out some kind of response, then tools such as packet sniffers give a lot of information without being intrusive. Such non-intrusive techniques are often preferred over intrusive ones because they don’t affect the system and the same time, have the ability to capture all the requests and responses for analysis.

It is not recommended however to require periodic API requests and responses for a notion of heartbeat or pulse on the application. Such requests and responses are unnecessary when the same information can be gleaned via the levers provided by the sophisticated platform or engine with which the application runs.

Monitoring can be automated so that human intervention can not always berequired but certain actions are required to attract attention and these are called alerts. Monitoring and alerts often go hand in hand allowing administrators to focus on investigating the system only when there is a problem. The alerts are typically preferred to be in mail format but even banner messages, pop-ups and flash can be used. The information in these alerts are more actionable than just the monitoring results. Typically they advice on the remedy actions as well. These therefore are complimentary to the monitoring efforts. An example of the monitoring and alertsinfrastructure can be seen inhttp://1drv.ms/1BPk7Hn

#codingexercise
Count the number of prefixes in a dictionary using a trie

Pseudocode :
Struct trie {
Integer words;
Integer prefixes;
Reference edges [26];
}

countPrefixes(vertex, prefix) k=firstCharacter(prefix)
if isEmpty(word)
return vertex.prefixes
else if notExists(edges[k])
return 0
else cutLeftmostCharacter(prefix)
return countWords(edges[k], prefix)

Saturday, June 27, 2015

Today we continue with one more problem solving discussion. We look at box stacking problem. We are given a set of n types of rectangular three dimensional boxes of length Li width WI and height hi . We have to stack the boxes so they for tallest tower. Each box must sit on top of a bigger or equal box and there can be many boxes of the same size. A rectangular box can be rotated so that it can lay on any of its faces.
This problem is similar to the previous increasing subsequence we have seen. We generates all the three base areas for a rectangle of a box. Then we sort them in increasing order. At this point, the pro lem is that of finding the longest common subsequence of height hi
Let MSH (i) be the max height of the tower.
MSH (i) = max ( MSH (j) + height hi) when w × l (j) is bigger and j < I and
If no such MSH (j) exists then MSH (i) = h (i)
This we can write as

for (int i = 1; i < n; i ++)

for (int j = 0; j <i; j++)

if (w (j)*l(j) > w (i)*l (i) && max (MSH (j) + h (i)) > msh (i)
MSH (i) = max (MSH (j) + h (i))

To get the maximum overall stack height, we return the max (MSH (i)) where 0 < I < n

We now look at a problem that involves backtracking instead of dynamic programming.
Let us say we want to color the map of United States such that no two bordering states have the same color. We are allowed to use four colors : red orange green and blue
Let us say we number the states say from 1 to 50 and we denote the current state with the number n
Let us say we consider each color for a state say denoted by c. If its' valid we choose the color and try to recursively find solutions applying that color for the n+1 going forward. If we found a solution applying this color or during the termination check when n == 50 then we return true. For backtracking, we remove the color and proceed with the next color as candidate. Finally we return false if we didn't find a solution using up all the available colors.

bool findColors(int n, Pair<int,color> states)
{
if (n == 51) {
    printColors();
    return true;
}
    foreach (color c in colors) {
      if (isValid(c,n)){
           states[n] = c; //apply color
           if (findColors(n+1, states))
              return true;
           states[n] = '/0';//remove color
        }
    }
return false;
}
Back tracking never considers an invalid option and reduces the number of options from the permutations.

Friday, June 26, 2015

Today we look at a few more problem solving.
First we look at a balanced partition of an array C of numbers such that the same of the two partitions have very little difference. Typically the partitions have half the total sum The numbers are each in the range 0 < x < k
This is also a dynamic programming problem. We create a boolean array T of size N + 1. T[x] will be true only if and only if there is a subset whose sum = x when we have this boolean array, we take the T[N/2] value. If this is true then there is a subset that adds up to this value.
We take C[i] and look at the table T. When we consider a new number C[i], we could simply ignore the number or include it. It may be possible to make the sum without including it. Now consider some location T[j] that has a one in it. It corresponds to some subset of numbers that adds up to the required sum. When we include C[i], we get a new sum C[i] + j and the corresponding T[j+sum(C[i])] should be set to true as well.
for (int i = 0; i < n; i ++)
for (int j = N - C[i]; j>= 0; j--)
if (T[j] ) then T[j+C[i]] = true;

return T[n/2]

N is the total sum
T is the boolean array with the first element set to True and everything else False.

#codingexercise

Double GetNthRootProductAlternateTermRaisedPPlusQoverPTimesQ (Double [] A,Double p, Double q)

{

If ( A== null) return 0;

Return A.NthRootProductAlternateTermRaisedPPlusQoverPTimesQ(p, q);

}

Thursday, June 25, 2015

Today we review some more coding problems. Let us take a look at uni-color triangles. You are given 10 points in a plane such that no three of them are collinear Each pair is connected by a green dash or a blue solid line How many uni color triangles with vertices at the given points are possible.
Since we know that it takes three points to form a triangle, the total number of triangles possible is 10 choose 3 = 120. This number three is helpful. In a multicolored triangle, two edges of the triangle will be the same color. Moreover, multicolored triangles + unicolor triangles = 120. Therefore we attempt to find unicolor triangles by finding the number of multicolored triangles. We make a table of the number of green and blue edges incident on each vertex. For e.g.:
Point 1 2 3 4 5 6 7 8 9 10
Blue 3 2 5 6 5 3 3 2 7 4
Green 6 7 4 3 4 6 6 7 2 5
We can now multiply the number of green and blue edges incident to the given points, and sum the products to find all possible multicolored triangles but each multi color triangle is counted twice because the ways to choose two edges of the same or different color still results in the same triangle. Therefore the number of uni-color triangle possible is 120 - 174/2 = 33 for the given table.
The problem is reduced from nChoose2 with complexity of O(n3) to counting n choose two edges O(n^2) for each of the vertices.
Thus instead of writing a combinatorial coding solution as

Void CombineToTriangles (Edges a, EdgeBuilder b, int start, int level)
{

For (int I = start ; I < a.Length; i++)
{
If (b.Length == a.Length) print b;
b[level] = a[i];
Print b.ToTriangles();
If (I < a.Length)
CombineToTriangles(a,b, start+1,level+1)
b[level] = ‘/0’;
}
}
we can reduce the problem to n!/(n-3)!3! - Sum-for-i-from-1-to-n(Blue-i * green-i / 2)

#codingexercise
Double GetNthRootProductAlternateTermRaisedPPlusQoverPDividedQ (Double [] A,Double p, Double q)
{
If ( A== null) return 0;
Return A.NthRootProductAlternateTermRaisedPPlusQoverPDividedQ(p, q);

}

Wednesday, June 24, 2015

Today we briefly revisit the bridge problem and see how to apply the longest increasing subsequence.
Let us say the cities on both sides of the river are 2, 3, 1 and 1, 2, 3.
from the previous discussions on the longest increasing subsequence, we can find it recursively by including the last element or not.
Let us keep track of the indices in an array
we take the first sequence and iterate from left to right. say i = 0
for an increasing subsequence of size j=1 and i = 0 upto and inclusive of 2, the minumum value of the length of the increasing subsequence is one greater than the length of the longest prefix at this element and now equals 1
we initialize IndexArray at 1 to be 1 and update the longest subsequence encountered so far to be 1

for an increasing subsequence of size j =1 and i = 1 upto and inclusive of 3, the minimum value by binary search is that for first element plus 1 which now equals 2

for the last element, the binary search points to the second element and has not changed the length L computed so far and updated last with 2.

If it increases the current sum, the element is included Thus we use a greedy strategy here

Formally,

Let X[i] represent the sequence 2, 3, 1
Let M[j] store the index k of the smallest value X[k] so that there is an increasing subsequence of length j ending at X[k] on the range k <=i and j <=k <= i. j represents the length of the increasing subsequence and k represents the index of termination.
and Let P[k] store the index predecessor of X[k]
The longest Increasing subsequence at any given point is given by X [M [1]], X [M [2]], ... X [M [L]] and if there is an increasing subsequence at i then there is one at i-1 which is X [P [M[i]]] and between these two bounds we can do a logarithmic search for j <=k <= i
Therefore the steps are:
P = array of length N - all initialized to zero
M = array of length N + 1 // pad on the left so that we can use zero based index.
L = 0
for i in range 0 to N-1:
Binary search for the largest positive j <= L
such that X[M[j]] < X[i]
lo = 1
hi = L
Binary Search between lo and hi adjusts lo and hi
After searching lo is 1 greater than the length of the longest prefix
newL = lo
The predecessor of X[i] is the last index of the subsequence of length newL-1
P[i] = M[newL -1]
M[newL] = i
if newL > L:
if we found a subsequence longer than L, update L
L = newL

// Reconstruct the longest increasing subsequence
S = array of length L
k = M[L]
for i in range L-1 to 0:
S[i] = X[k]
k = P[k]

return S

For the bridge problem, we find the longest increasing subsequence first for the north side and then for the south side of the river and take the smaller of the two.

#codingexercise
Double GetNthRootProductAlternateTermRaisedPMinusQoverPDividedQ (Double [] A,Double p, Double q)
{
If ( A== null) return 0;
Return A.NthRootProductAlternateTermRaisedPMinusQoverPDividedQ(p, q);
}