Cluster computing

Wednesday, September 7, 2016

Maximum flows can also be calculated using "push-relabel" method. This method works in a more localized manner than Ford Fulkerson method. It works on one vertex at a time, looking only at the vertex's neighbour in the residual network. Thus it avoids examining the entire residual network to find an augmenting path. Also, they do not maintain flow conservation. Instead they maintain a preflow property which goes to say that the total flow into a vertex may exceed the flow out.

The excess at a vertex is the amount by which the flow in exceeds the flow out and the vertex is said to be overflowing. This method applies two operations : pushing preflow and relabeling a vertex.

Both Ford Fulkerson and Push Relabel envision the graph to be a flow network as a system of interconnected pipes of given capacities. The former uses augmenting paths in the network to give rise to an additional stream of fluid with no branches between source and sink and iteratively adds streams until there are no more. The latter assumes that each vertex has an outflow pipe leading to an arbitrarily large reservoir that can accumulate fluid. In addition, it assumes that each vertex, it's reservoir, and all its pipe connections sit on a platform whose height increases as the algorithm progresses. The height is used to push fluids only downhill which means from a higher vertex to a lower vertex even though there may be flows from a lower vertex to a higher vertex that may be positive.

The only outgoing flow from a vertex u that is not already saturated with flow are those that connect to vertices at the same level or at a higher level. In order to make a flow outbound from u we relabel it by pushing it one unit higher than its neighbors. This helps us rid an overflowing vertex u of its excess flow.

By repeating this iteratively, we induce as much flow downwards as possible. The excess in the reservoirs of the overflowing vertices is sent back up to a height higher than the source. Throughout the iterations and at this point, the preflow property holds and coverges to having no excess flow at the vertices which also results in maximum flow as the capacity constraints hold.

#codingexercise

Void getkthlast(node head, int k, ref int count, ref node kthlast)

{

If (head == null) return;

Getkthlast(head.next, k, ref count, ref kthlast);

count = count + 1;

if (count==k) { kthlast = head; return;}

}

You have a you tube video. A person watches the video in random order. You are given the start and end time of various intervals he watched. How will you confirm whether he has watched the full video or not.

Bool hasWatchedFull( interval full, List<interval> slices)
{
Var merged = merge(slices);
If (merged.first.start <= full.start && merged.last.end >= full.end && merged.sum() == full.end - full.start) return true;

return false;

}
LIst<Interval> merge(List<Interval> slices)
{
Var s = new Stack<Interval> ();
Slices.sort(); // by starting time
If (slices.count == 0) return new List<Interval>{Interval(0,0)};
If (slices.count == 1) return new List<Interval>{slices[0]};
S.push(slices[0]);
For(int i =1; i < slices.count; i++)
{
var top = slices.top();
If (top.end()< slices[i].start)
S.push(slices[i]);
Else{
Top.end = slices[i].end;
S.push(top);
}
Return s.toList():
}

}

Tuesday, September 6, 2016

#codingexercise

In the isInterleaved method earlier, we assume that the match has to be with one of the strings and not with both based on the premise in the question. If this restriction were to be relaxed we would match both one at a time.
In this case, we could do this recursively.
Bool isInterleaved(char* A, char* B, char* C)
{
if (*c == '/0') return true;
if (*a == '/0' && '*b' == '/0') return false;
if (*c == *a && *c == *b){
return isInterleaved(++A, B, ++C) || isInterleaved(A, ++B, ++C);
}
if (*c == *a){
return isInterleaved(++A, B, ++C);
}
if (*c == *b) {
return isInterleaved(A, ++B, ++C);
}
return false;
}

node reverse(node head, node prev)
{
if (head==null) return prev;
head.next = prev;
return reverse(head.next, head);
}

Monday, September 5, 2016

#algorithm discussion continued
Today we continue reading the flow networks. We were looking at max flow min cut theorem and the properties associated with it. We saw how it applied to Ford Fulkerson method. Each iteration of the method augmented the flow by considering a path in the residual network until there is no more. For each path considered, the residual capacity of that path is found and augmented to the flow network.
Maximum flow problems have a direct application in some combinatorial problems and we saw how it applied to matching in bipartite graphs. A maximum bipartite matching is a matching of maximum cardinality.
To find this maximum matching, we convert the bipartite graph into a flow network in which the flows correspond to matchings. A source s and a sink v are added to either sides of the bipartite graph. We assign directed edges from L to R with unit capacity for all edges with flows and zero otherwise. This is called an integer valued flow.
If the capacity function c takes on only integral values, then the maximum flow f produced by the Ford Fulkerson method has the property that f is an integer. Moreover for all vertices u and v, the flow f(u,v) is an integer.
We can prove this by induction on the number of iterations. Let us take only one iteration. Since the Ford Fulkerson method uses residual capacity and the capacity function is integral, we know that for the first iteration, this is trivially true.
Now let us assume that the nth iteration is an integral flow network. In the next iteration, we find the residual capacity and this will be for a path not encountered earlier. In this path, the residual capacity will be found as integer because the capacity function is integral. Any addition or subtraction from integers will result in integers. Therefore, the n+1th iteration will also result in an integer flow network. Consequently the claim holds.
Another claim made is that the cardinality of maximum matching M in a bipartite graph G equals the value of maximum flow f in a corresponding flow network G'.
This follows from the contradiction that if the corresponding flow f in G' is not maximum, then there is a maximum f' corresponding to the bipartite graph that is more than f and is integral. This corresponds to a maximum matching M' greater than M which contradicts our assumption and therefore the claim holds.
Maximum flows can also be calculated using "push-relabel" method. This method works in a more localized manner than Ford Fulkerson method.
#codingexercise
Check if a given string C is an interleaving of other two strings A and B with non repeating characters:

Bool isInterleaved(char* A, char* B, char* C)
{
While(*C)
{
If (*C == *B)
   B++;
Else if (*C == *A)
A++;
Else
Return false;
}
C++;
}
Return true;
}

Permutations of a string:
Void Permute (String a, StringBuilder b, bool[] used)
{
If ( b.Length == a.Length) { print b.ToString(); return;}
For (int I = 0; I < a.Length ; i++)
{
    If (used[i]) continue;
     used[i] = true;
     B += A[i];
     Permute(a, b, used);
     B [B.Length – 1] = ‘/0’;
     used[i] = false;
}
}

Sunday, September 4, 2016

#algorithm discussion continued
Today we continue reading the flow networks. We were looking at max flow min cut theorem and the properties associated with it. We saw how it applied to Ford Fulkerson method. Each iteration of the method augmented the flow by considering a path in the residual network until there is no more. For each path considered, the residual capacity of that path is found and augmented to the flow network.
How the path is chosen is not mentioned by the method. This is usually done with Breadth First Search
We saw that with BFS, the shortest path distance for all vertices in the residual network increases monotonically with each flow augmentation. We saw that in the forward direction, when we assumed the distance to decrease, we contradicted ourselves because the next vertex v was one hop away from the current vertex u and since this vertex was not already considered its distance was the sum of the shortest distance to current vertex and one hop. This would mean that the distance to the new vertex before and after the decrease would be the same which is a contradiction.Therefore this distance increases monotonically.
Maximum flow problems have a direct application in some combinatorial problems. The multiple-source multiple-sink maximum flow. One such example is to find the maximum matching in a bipartite graph. This can be done in O(VE) time in a G = (V,E) graph
A matching is a subset of edges M such that for all vertices v in V, at most one edge of M is incident on v. We say that a vertex v belongs to V is matched by the matching M if some edge in M is incident on V and unmatched otherwise. A maximum matching is one where the number of matchings is maximum. A graph is called bipartite when all its vertex set can be partitioned into disjoint sets L and R and all edges in E go between L and R.
To find this maximum matching, we convert the bipartite graph into a flow network in which the flows correspond to matchings. A source s and a sink v are added to either sides of the bipartite graph. We assign directed edges from L to R with unit capacity for all edges with flows and zero otherwise. Since all the flows are thus integers, the flow network is called integer valued.
If M is a matching in the bipartite graph G, then there is an integer valued flow f in flow network G' with cardinality of f to be the same as that of M. Conversely if f is an integer valued flow in G', then there is a matching M in G with cardinality M to be same as that of f. Since the flow f is maximum as found by the Ford-Fulkerson method, the cardinality of maximum matching M equals the value of maximum flow.
#codingexercise
Given an array of heights of poles. Find the no of poles which are visible if you are standing at the kth pole.
int Getvisible(List<int>h, int n, int k)
{
var left = new int[n]{};
var right = new int[n]{};
int count = 1; // the current pole

left[0] = h[0];
for (int i = 1; i < n; i++)
left [i] = max(left[i-1], h[i]);

right[n-1] = h[n-1];
for (int i = n-2; i >= 0; i--)
right[i] = max(right[i+1], h[i]);

for (int i =k-1; i >= 0; i--){
count++;
if(h[i] != left[k])
break;
}
for (int i =k+1; i < n; i++){
count++;
if (h[i] != right[k])
break;
}
return count;
}

Saturday, September 3, 2016

Connecting AWS connector with Okta

Many organizations participate in multi factor authentication for their applications. At the same time, they expect the machines deployed in their private cloud to be joined to their corporate network. If this private cloud were to be hosted on the AWS as a VPC, it would require some form of AD Connector to talk to the on-premise Active Directory. The AD connector is a proxy and the Active directory is on – premise for the entire organization. In this writeup targeted at a software developer audience, I explain how to configure AWS connector with Okta.

We use AWS to connect AD using the Okta agent. The Okta AD or LDAP agent is downloaded and installed on any Windows Server with access to the Domain Controller. The Okta LDAP agent is downloaded to the Linux environment.

We can eliminate login and password hassles by connecting the AWS with existing corporate credentials.

Further more, for new credentials, this lets us automatically provision, update or deprovision AWS accounts when we update AD or LDAP.

#codingexercise
Given a bst and a lower boundary values. Prune the tree if the node data lies over the boundary values

Node trimRangeLower( node root, int min)

{

If (root == null) return root;

root.left = trimRangeLower(root.left, min,);

root.right = trimRangeLower(root.right, min);

If (root.data < min)

{

Var right = root.right;

delete root;

Return right;

}

Return root;

}
Given a bst and a higher boundary values. Prune the tree if the node data lies under the boundary value
Node trimRangeHigher( node root, int max)

{

If (root == null) return root;

root.left = trimRangeHigher(root.left, max);

root.right = trimRangeHigher(root.right, max);

If (root.data > max )

{

Var left = root.left;

delete root;

Return left;

}

Return root;

}

Friday, September 2, 2016

In house or out sourced cloud

In this article, I’d like to make a case for doing away with private cloud and suggest alternatives including Amazon virtual private cloud and masquerading public cloud as private.

First, customers are falsely attracted to “private cloud” offerings. What they really want are the benefits of cloud computing such as scalability, elasticity, rolling applications and virtual machines, etc. but most of them are misled into thinking a “public cloud” is less secure and costlier than a “private cloud”. I will make a case that there is a total cost of ownership in a private cloud that makes it less attractive than alternatives.

Second, as an IT provider, it is easier to provision new compute and storage using traditional hosting albeit in the form of datacenters. This generally allows secure, dynamic, scale-able and reusable architecture that can host the business applications from the customers. Yet this does not necessarily result in cost savings, more flexibility or even more security as much as the robust, reliable and comprehensive public cloud from vendors such as Amazon Web Services.

In this article, we will explore the technical feasibility of alternatives (and skip the cost comparison for later between existing and proposed solution)

Amazon Virtual Private Cloud lets us provision an isolated set of virtual machines in a private network but still hosted on AWS cloud. This is similar in nature to the private cloud hosted in data centers but with the benefits of using a scaleable infrastructure. And we have complete control over the virtual network. Generally they are used when we don't want internet access in private facing subnet.

A sample use case may be where the application and the database servers are separated into subnets with the forward facing application on the public cloud and the isolated database servers in a private VPC. This adds a layer of security to the database servers since they don't have internet connectivity.

Moreover, security can be increased in a VPC with the use of security groups, network access control lists (ACLs) and flow logs. The security groups control both the inbound and outbound traffic at the instance level. The network ACLs control both the inbound and outbound traffic at the subnet level. Flow logs capture information about the IP traffic going to and from the network interfaces.

In a VPC, all aspects of a network can be controlled, we can select its IP address range, create subnets, and configure route tables, network gateways, and security settings. We can control how the instances launched into a VPC access resources outside the VPC. An internet gateway is available to enable our instances to connect to the internet through the Amazon EC2 network edge.

If we want to add corpnet connectivity of the VPC instances to the company, we could make use of AWS directory services. AWS directory services makes it easy to setup and run Microsoft AD in AWS cloud or connect AWS resources with an existing on-premises Microsoft Active Directory. This helps manage users and groups, provide single sign on to applications and services, create and apply group policy, domain join EC2 instances, as well as simplify the deployment and management of cloud based Linux and Microsoft Windows workloads.

AWS automatically brings compliance and governance standards that would have otherwise taken more labour and time on the private cloud. AWS also cares about data privacy and provides simple tools to manage ownership and control of sensitive customer content

#codingexercise
Given a bst and two boundary values. Prune the tree if the node data lies outside the boundary values

Node trimRange( node root, int min, int max)

{

If (root == null) return root;

root.left = trimRange(root.left, min,max);

root.right = trimRange(root.right, min, max);

If (root.data < min)

{

Var right = root.right;

delete root;

Return right;

}

If(root.data > max)

{

Var left = root.left

delete root;

Return left;

}

Return root;

}

If we were to find only one end of the range, we would exclude just the check towards the end of the method above

Thursday, September 1, 2016

Find minimum distance between two numbers in an array
Int getMinDistance(list<int> nums, int X, int Y)
{
Int min = int_max;
For(int i = 0; i < nums.count; i++)
   For(int j = i + 1; i < nums.count; i++)
{
     If (((nums[i] == X && nums[j] == Y) ||
          (nums[i] == X && nums[j] == Y)) &&
          Math.abs(i-j) < min)
           Min = math.abs(i-j);
}
Return min;
}
Int getMinDistance(List<int> nums, int X, int Y)
{
Int min = int_max;
Int prev = min(nums.IndexOf(x), nums.IndexOf(y));
For (int I = prev+1; I < nums.Count; i++)
{
    If (num[i] == x || num[i] == y)
   {
       If (nums[prev] != nums[i] && (i-prev) < min)
       {
              min = I – prev;
       }
       Else
             Prev = I;
   }
}
Return min;
}
Int getMinDistance(List<int> nums, int X, int Y)
{
Var xi = nums.IndexesOf(x); // sorted and ascending
Var yi = nums.IndexesOf(y); // sorted and ascending and different from xi
Int I = 0; int j = 0;
Int val = -1; // neither x nor y
Int prev = -1;
Int min = int_max;
While ( I < xi.count && j < yi.count)
{
If ( xi[i] < yi[j])
   If (val == -1) {val = x; prev = xi[i];}
   If (val != x &&xi[i]-prev < min){
        Min = xi[i]-prev;
        Val = x;
        Prev=xi[j];
}
   I++;
Else{
   If (val == -1){val = y; prev = yi[j];}
   If (val != y &&yi[j]-prev < min){
        Min = yi[j]-prev;
        Val = y;
         Prev = yi[j];
}
   J++;
}
}
While (I < xi.count)
{
   If (val == -1) {val = x; prev = xi[i];}
   If (val != x &&xi[i]-prev < min){
        Min = xi[i]-prev;
        Val = x;
        Prev=xi[j];
}
I++;
}
While (j < yi.count)
{
   If (val == -1){val = y; prev = yi[j];}
   If (val != y &&yi[j]-prev < min){
        Min = yi[j]-prev;
        Val = y;
         Prev = yi[j];
}
   J++;
}
Return min;
}