Cluster computing

Saturday, September 3, 2016

Connecting AWS connector with Okta

Many organizations participate in multi factor authentication for their applications. At the same time, they expect the machines deployed in their private cloud to be joined to their corporate network. If this private cloud were to be hosted on the AWS as a VPC, it would require some form of AD Connector to talk to the on-premise Active Directory. The AD connector is a proxy and the Active directory is on – premise for the entire organization. In this writeup targeted at a software developer audience, I explain how to configure AWS connector with Okta.

We use AWS to connect AD using the Okta agent. The Okta AD or LDAP agent is downloaded and installed on any Windows Server with access to the Domain Controller. The Okta LDAP agent is downloaded to the Linux environment.

We can eliminate login and password hassles by connecting the AWS with existing corporate credentials.

Further more, for new credentials, this lets us automatically provision, update or deprovision AWS accounts when we update AD or LDAP.

#codingexercise
Given a bst and a lower boundary values. Prune the tree if the node data lies over the boundary values

Node trimRangeLower( node root, int min)

{

If (root == null) return root;

root.left = trimRangeLower(root.left, min,);

root.right = trimRangeLower(root.right, min);

If (root.data < min)

{

Var right = root.right;

delete root;

Return right;

}

Return root;

}
Given a bst and a higher boundary values. Prune the tree if the node data lies under the boundary value
Node trimRangeHigher( node root, int max)

{

If (root == null) return root;

root.left = trimRangeHigher(root.left, max);

root.right = trimRangeHigher(root.right, max);

If (root.data > max )

{

Var left = root.left;

delete root;

Return left;

}

Return root;

}

Friday, September 2, 2016

In house or out sourced cloud

In this article, I’d like to make a case for doing away with private cloud and suggest alternatives including Amazon virtual private cloud and masquerading public cloud as private.

First, customers are falsely attracted to “private cloud” offerings. What they really want are the benefits of cloud computing such as scalability, elasticity, rolling applications and virtual machines, etc. but most of them are misled into thinking a “public cloud” is less secure and costlier than a “private cloud”. I will make a case that there is a total cost of ownership in a private cloud that makes it less attractive than alternatives.

Second, as an IT provider, it is easier to provision new compute and storage using traditional hosting albeit in the form of datacenters. This generally allows secure, dynamic, scale-able and reusable architecture that can host the business applications from the customers. Yet this does not necessarily result in cost savings, more flexibility or even more security as much as the robust, reliable and comprehensive public cloud from vendors such as Amazon Web Services.

In this article, we will explore the technical feasibility of alternatives (and skip the cost comparison for later between existing and proposed solution)

Amazon Virtual Private Cloud lets us provision an isolated set of virtual machines in a private network but still hosted on AWS cloud. This is similar in nature to the private cloud hosted in data centers but with the benefits of using a scaleable infrastructure. And we have complete control over the virtual network. Generally they are used when we don't want internet access in private facing subnet.

A sample use case may be where the application and the database servers are separated into subnets with the forward facing application on the public cloud and the isolated database servers in a private VPC. This adds a layer of security to the database servers since they don't have internet connectivity.

Moreover, security can be increased in a VPC with the use of security groups, network access control lists (ACLs) and flow logs. The security groups control both the inbound and outbound traffic at the instance level. The network ACLs control both the inbound and outbound traffic at the subnet level. Flow logs capture information about the IP traffic going to and from the network interfaces.

In a VPC, all aspects of a network can be controlled, we can select its IP address range, create subnets, and configure route tables, network gateways, and security settings. We can control how the instances launched into a VPC access resources outside the VPC. An internet gateway is available to enable our instances to connect to the internet through the Amazon EC2 network edge.

If we want to add corpnet connectivity of the VPC instances to the company, we could make use of AWS directory services. AWS directory services makes it easy to setup and run Microsoft AD in AWS cloud or connect AWS resources with an existing on-premises Microsoft Active Directory. This helps manage users and groups, provide single sign on to applications and services, create and apply group policy, domain join EC2 instances, as well as simplify the deployment and management of cloud based Linux and Microsoft Windows workloads.

AWS automatically brings compliance and governance standards that would have otherwise taken more labour and time on the private cloud. AWS also cares about data privacy and provides simple tools to manage ownership and control of sensitive customer content

#codingexercise
Given a bst and two boundary values. Prune the tree if the node data lies outside the boundary values

Node trimRange( node root, int min, int max)

{

If (root == null) return root;

root.left = trimRange(root.left, min,max);

root.right = trimRange(root.right, min, max);

If (root.data < min)

{

Var right = root.right;

delete root;

Return right;

}

If(root.data > max)

{

Var left = root.left

delete root;

Return left;

}

Return root;

}

If we were to find only one end of the range, we would exclude just the check towards the end of the method above

Thursday, September 1, 2016

Find minimum distance between two numbers in an array
Int getMinDistance(list<int> nums, int X, int Y)
{
Int min = int_max;
For(int i = 0; i < nums.count; i++)
   For(int j = i + 1; i < nums.count; i++)
{
     If (((nums[i] == X && nums[j] == Y) ||
          (nums[i] == X && nums[j] == Y)) &&
          Math.abs(i-j) < min)
           Min = math.abs(i-j);
}
Return min;
}
Int getMinDistance(List<int> nums, int X, int Y)
{
Int min = int_max;
Int prev = min(nums.IndexOf(x), nums.IndexOf(y));
For (int I = prev+1; I < nums.Count; i++)
{
    If (num[i] == x || num[i] == y)
   {
       If (nums[prev] != nums[i] && (i-prev) < min)
       {
              min = I – prev;
       }
       Else
             Prev = I;
   }
}
Return min;
}
Int getMinDistance(List<int> nums, int X, int Y)
{
Var xi = nums.IndexesOf(x); // sorted and ascending
Var yi = nums.IndexesOf(y); // sorted and ascending and different from xi
Int I = 0; int j = 0;
Int val = -1; // neither x nor y
Int prev = -1;
Int min = int_max;
While ( I < xi.count && j < yi.count)
{
If ( xi[i] < yi[j])
   If (val == -1) {val = x; prev = xi[i];}
   If (val != x &&xi[i]-prev < min){
        Min = xi[i]-prev;
        Val = x;
        Prev=xi[j];
}
   I++;
Else{
   If (val == -1){val = y; prev = yi[j];}
   If (val != y &&yi[j]-prev < min){
        Min = yi[j]-prev;
        Val = y;
         Prev = yi[j];
}
   J++;
}
}
While (I < xi.count)
{
   If (val == -1) {val = x; prev = xi[i];}
   If (val != x &&xi[i]-prev < min){
        Min = xi[i]-prev;
        Val = x;
        Prev=xi[j];
}
I++;
}
While (j < yi.count)
{
   If (val == -1){val = y; prev = yi[j];}
   If (val != y &&yi[j]-prev < min){
        Min = yi[j]-prev;
        Val = y;
         Prev = yi[j];
}
   J++;
}
Return min;
}

Wednesday, August 31, 2016

Persistent connections versus pooling - php framework
When the traffic to one or more php application is high, the application may open several connections to the database server using tge same credentials. The database server may reject these saying that there are too many connections open and the application throws an exception saying CDBconnection could not open a connection. In such cases, the database server expects the application to not open new connections but to use some form of connection pooling. Pooling helps conserve the resources which would otherwise have been wasted one for every connection. Also, it makes the application run fast as the coonections dont have to be opened on every request. In addition to max_connections parameter to pooling, another parameter that plays important role is thread_cache_size. Requests for threads are satisfied by reusing threads taken from the cache whenever possible and only if the cache is enpty, a new thread is created. Typically, this variable is set so that tge fraction of requests for wgich threads are newly created is rather small.
Consequently in a high traffic server this maybe setvto a arbitrarily high value
However,
Php framework operates on a request basis. It does not support connection pooling. The alternatives are to use something else that the framework provides as we cannot use anything outside the framework. One such technique is to use persisted connections. We set this property to true in the application config file under the db section.
A caveat with this approach is that it does not address any issues originating from the sql used with a connection by the user or application. If there's a bug in the sql wjich leads to long running queries, then this won't mitigate that. However, the first step is to lower the number of connections opened before analysing any locks held and orphaned by SQL. It is therefore prudent to use register_shutdown_function that cleansup after each SQL execution.

# a microservice example
https://drive.google.com/file/d/0B-H0-bJJQAq5eFhuYjIyQXNFR3c/view?usp=sharing
https://drive.google.com/file/d/0B-H0-bJJQAq5aFlQLS0wZ3pPc28/view?usp=sharing
#codingexercise
Find if there exists a pair in an array that can add up to a sum that us divisible by k
We gave a few solutions yesterday.

Another approach for small range of numbers in the list is as follows:
bool hasPairDivisibleByK(List<int> nums, int k)
{
Assert (nums.all(x => x > 0));
Assert( k > 0);
Var min = nums.min();
Var max = nums.max();
Int low = (min + min ) /k;
Int high = (max + max) /k;
nums.sort();
For (int I = low; I <= high+1; i++)
If (hasPairSum(nums, I * k) == true)
Return true;
Return false;
}
bool hasPairSum(List<int> nums, int sum)
{
int left = 0;
int right = nums.Count -1;
if (nums.Count < 2 ) return false;
// nums.Sort();
while (left < right)
{
if(nums[left] + nums[right] == sum)
return true;
else if (nums[left] + nums[right] < sum)
left++;
else
right--;
}
return false;
}

Find minimum distance between two numbers in an array
Int getMinDistance(list<int> nums, int X, int Y)
{
Int min = int_max;
For(int i = 0; i < nums.count; i++)
For(int j = i + 1; i < nums.count; i++)

{

If (((nums[i] == X && nums[j] == Y) ||

(nums[i] == X && nums[j] == Y)) &&

Math.abs(i-j) < min)

Min = math.abs(i-j);

}

Return min;

Tuesday, August 30, 2016

#algorithm discussion continued
Today we continue reading the flow networks. We were looking at max flow min cut theorem and the properties associated with it. We saw how it applied to Ford Fulkerson method. Each iteration of the method augmented the flow by considering a path in the residual network until there is no more. For each path considered, the residual capacity of that path is found and augmented to the flow network.
How the path is chosen is not mentioned by the method. This is usually done with Breadth First Search
We saw that with BFS, the shortest path distance for all vertices in the residual network increases monotonically with each flow augmentation. We saw that in the forward direction, when we assumed the distance to decrease, we contradicted ourselves because the next vertex v was one hop away from the current vertex u and since this vertex was not already considered its distance was the sum of the shortest distance to current vertex and one hop. This would mean that the distance to the new vertex before and after the decrease would be the same which is a contradiction.Therefore this distance increases monotonically.
We considered the case when (u,v) is not part of the flow before the augmentation Now let us say that it is part of the flow after the augmentation.. This means the augmentation increased the flow from v to u. This algorithm always augments flow along the shortest paths and therefore the shortest path from s to u has (v,u) as the last edge. In this case, the shortest distance to v is one hop less than the shortest distance to u. This in turn must be less than the shortest distance to u after augmenation because we maintianed that the shortest distance to u did not change from the get go. Also we know that after augmentation, the distance between u and v are only one hop apart now that they are connected. So the shortest path distance between source and v before augmentation works out to be two hops less than the shortest path distance between source and v after augmentation which contradicts our assumption that augmentation decreased the shortest distance . Therefore such a vertex v cannot exist and the shortest distance increases monotonically.
#codingexercise
Determine the floor of an input value in binary search tree

Floor is the value of a node that is the largest of all those that are lower than the given one

Int floor(node root, int val)
{
If (root == null) return -1;
If (root.data == val) return val;
If (root.data > input) return floor(root.left, val);
Int floorright = floor(root.right, val);
If (floorright == -1)
Return root.data;
Else
Return floorright;
}

Determine if there are two elements in a list that adds up to a number divisible by k
bool hasPairDivisiblyByK(List<int> nums, int k)
{
for (int i = 0; i < nums.Count; i++)
for (int j = i+1; j < nums.Count; j++)
if ((nums[i] + nums[j]) % k == 0)
return true;
return false;
}

bool hasPairDivisiblyByK(List<int> nums, int k)
{
Var h = new Hashtable<int, int>();
for (int i = 0; i < nums.Count; i++)
h[nums[i] % k] += 1;
for (int i = 0; i < nums.Count; i++){
Int rmndr = nums[i] % k;
If (rmdr == 0)
Return h[rmdr] > 1;
If( h[k - rmndr] > 0)
Return true;

}
return false;
}

If we want to save on space while keeeping the time complexity less than O(n^2), we could sort the numbers and for each of the multiples of k that lie between a min and a max in the range, we could find pairs that add up to it.

Floor is the value of a node that is the largest of all those that are lower than the given one

Monday, August 29, 2016

#algorithm discussion continued
Today we continue reading the flow networks. We were looking at max flow min cut theorem and the properties associated with it. We saw how it applied to Ford Fulkerson method. Each iteration of the method augmented the flow by considering a path in the residual network until there is no more. For each path considered, the residual capacity of that path is found and augmented to the flow network.
How the path is chosen is not mentioned by the method. This is usually done with Breadth First Search
When the BFS is used to choose the augmenting path as a shortest path from s to t in the residual network, where each edge has unit distance, the algorithm is then called Edmonds-Karp algorithm
It can be shown that with this algorithm for all vertices, the shortest path distance in the residual network increases monotonically with each flow augmentation.
This is explained in terms of contradiction. The distance between the start and a vertex v that has not already been included is assumed to decrease and then we derive the contradiction. Let delta be the shortest path just before the decrease and delta-dash be the shortest path immediately after. We know that the decrease is one hop because BFS traversal is for neighbors. and The distance of the vertex u is not decreased.
Moreover, we know that the edge (u,v) was not part of the shortest distance earlier because if it were the shortest distance from source to v would have been less than or equal to the shortest distance from s to u plus one hop. This in turn would have been smaller than source s to u after the current decrease of the shortest distance from s to u together with the one hop to v which we just now assumed. This again in turn would have been equal to the shortest distance from source to v after the decrease because the distance from s to u plus one hop is the distance from s to v. This means that the distance from source to v before and after the decrease is the same which is a contradiction.
The same reasoning by contradiction works out in the reverse direction. Together by both reasoning in both directions, the assumption that such a vertex v exists is proved false. Hence the shortest path distance in the residual network increases monotonically.

Question: Why does streaming work with aggregates in processing graphs and tables ?
Answer : This is because the aggregate can be projected as and when the results arrive and they were the ones being done in batch.
User defined functions and user defined operators can also be used similarly.

The Microsoft StreamInsight Queries follow a five step procedure:

1) define events in terms of payload as the data values of the event and the shape as the lifetime of the event along the time axis

2) define the input streams of the event as a function of the event payload and shape. For example, this could be a simple enumerable over some time interval

3) Based on the events definitions and the input stream, determine the output stream and express it as a query. In a way this describes a flow chart for the query

4) Bind the query to a consumer. This could be to a console. For example

a. Var query = from win in inputStream.TumblingWindow( TimeSpan.FromMinutes(3)) select win.Count();

5) Run the query and evaluate it based on time

#codingexercise
Determine the ceiling of an input value in binary search tree

Ceiling is the value of a node that is the smallest of all those that are higher than the given one

Int ceil(node root, int val)
{
If (root == null) return -1;
If (root.data == val) return val;
If (root.data < input) return ceil(root.right, val);
Int ceilleft = ceil(root.left, val);
If (ceilleft == -1)
Return root.data;
Else
Return ceilleft;
}

Similarly, the floor of a value can be found in BST.