Cluster computing

Saturday, May 14, 2016

#coding exercise
Find the minimum number of steps to reach the end of array from start. Array values give how much we can move.
Int GetMin( List <int> arr, int start, int end )
{
If (start == end) return 1;
Return min (
1 + GetMin (Arr, start+ arr [start], end),
GetMin (arr, start+1,end));
}

median of a stream of two numbers:
Maintain two heaps max and min
on a new number
if (total size is odd)
if max_heap size is >0 && num > median
max_heap.push(num)
heapify
num = max_heap.pop_back()
heapify
min_heap.add(num)
heapify
else
if min_heap size is > 0 && num < median
min_heap.push(num)
heapify
num = max_heap.pop_back()
heapify
max_heap.add(num)
heapify
median :
heap_size is odd
min_heap[0]
else
(max_heap[0] + min_heap[0] ) /2

It may be better to rename the heaps as left and right because one will be max heap and the other will be min heap and we merely extract the top and insert. In other words we balance.

Find the maximum difference in the values between a node and an ancestor in a tree
int GetMin(Node root, ref int max)
{
if (root == null) return INT_MAX;
int min_left = GetMin(root.left);
int min_right = GetMin(root.right);
int min = min(root.data, min_left, min_right);
if ( root.data - min > max)
max = (root.data - min);
return min;
}

Given n eggs and k floors where any egg might break when dropped from a certain minimum height, find the minimum number of tries to find the answer
int W(int n, int k)
{
if (k == 1) return 1;
if( n== 1) return k;
int min = INT_MAX;
for (int i = i; i <= k; i++)
int val = max(W(n-1,i-1), W(n, k-i))
if ( val < min)
min = val;
return 1 + min;
}
#autocomplete : https://github.com/ravibeta/javascriptexamples/blob/master/Autocompleteextender
Given an array A of integers, find the maximum of j - i subjected to the constraint of A[i] <= A[j]
int GetMax(List<int> A, int start, int end)
{
int max = 0;
for (int k = start; k <= end; k++)
{
int ret = k;
int maxindex = k;
for(int I = end; I > k; I--)
     if (A[I] > A[k]){
           maxindex = I;
           break;
   }
if (maxindex-k> max)
   max = maxindex-k;
}
return max;
}

Friday, May 13, 2016

Applications of Machine Learning for private cloud operations
We continue listing some more scenarios from yesterday's post.

Use case 8) Feature Extraction 

We want to find server request patterns from a set which can help with identifying themes that are independently present in various combinations. Like clustering, we are not interested in making predictions but in finding interesting things about the data. Similarly from server usage patterns, we may be able to find multiple underlying causes that combine to create results.

We find a feature matrix from that has a row for each feature and a column for chosen set of server requests. We also find a weights matrix that maps the features to the requests matrix. Each row is a request and each column is a feature. From the data converted into a nested dictionary, we factorize to get the features and the weights matrix.
Use case 9) matching customers
Customers may request various resources from the cloud over time. They may be interested in servers, file shares at specific regions. If we help find a buddy for the customer, they may benefit from their common interests.
We could use an advanced classifier like kernel methods and support vector machines to make predictions whether people with a given set of attributes will match or not.
#codingexercise
Find the minimum number of swaps required for arranging pairs adjacent to each other.
eg:
pairs[] = { 1->3, 2->6, 4->5 }
arr[] = {3, 5, 6, 4, 1, 2}

int GetMinSwaps(List<Tuple<int,int>> pairs, List<int> arr, int start, int end) start and end are for the pairs
{
if (arr.Count <= 1)return 0;
if ( start >= arr.Count) return 0;
// no-op
if (arr[start] == pairs.find_match(arr[1])) return GetMinSwaps(pairs, arr, start+2, end);

int index = arr.find(pairs.find_match(arr[0));
Swap(arr, start+1, index);
int a = GetMinSwaps(pairs, arr, start+2, end);
Swap(arr,start+1, index);

index = arr.find(pairs.find_match(arr[1]));
Swap(arr, start, index);
int b = GetMinSwaps(pairs, arr, start+2, end);
Swap(arr, start, index);
return 1 + min(a,b);
}

int RemoveExtraSpaces(stringbuilder input)

{

var words = input.split();

return words.aggregate( (sent, next) => sent + " " + next);

}

Find the sum of bit differences between all pairs in an array of n integers

Int bitdiff(List<int> arr, int n)

{

int total = 0;

For ( int I =0; I < 32; I ++)

{

int count = 0;

for (int j = 0; j <n ; j++)

if (arr[j] & (1 << i))

count++;

total += (count * (n-count) * 2);

}

return total;

}

Find next greater element for every element of an array to its right in O(n)
List<int> GetNextLargest(List<int> arr)
{
var greater = new List<int>(); // stores index
for (int i = 0; i < arr.Count; i++)
{
int next = -1;
for (int j = i+1; j <arr.Count; j++)
{
if (arr[j] > arr[i])
{
next = j;
break;
}
}
greater[i] = next;
}
return greater;
}
Arrange balls in a line so that no two of the same type are together
Void arrange (ref List <ball> candidate, List <int> colornum, int start, int level)
{

For (int I = start, I < colornum.sum (); i++)
Candidate [level] = 0;
for ( int color = 0; color < colornum.count; color++)
If (level > 0 && candidate [level-1] != color && candidate( x => x == color) != colornum [color])
Candidate [level] = color;
If (candidate.length == colornum.sum ())
Print candidate.toString ();
If (I < colornum.sum ())
{
Arrange( ref candidate, colornum, start+1, level+1);
}

Candidate [level] = -1;
}

}

Thursday, May 12, 2016

Applications of Machine Learning for private cloud operations
We continue listing some more scenarios from yesterday's post.

Use case 4) Optimization 

Users may want the portal to search for the lowest cost set of resources when there are many choices for their solution when there are many different solutions. Optimization finds the best solution to a problem by trying many different solutions and scoring them to determine their quality. 

First we propose a cost function to determine the costs for different sets of resources. Then we use hill climbing or simulated annealing to reduce the cost 

Use case 5) Document Filtering 

This is about training a classifier to classify documents that are published as resource usages and logs. The classifiers start off very uncertain and increase in certainty as it learns which features are important for making a distinction.  By classifying the documents, we expect to find resources that require attention either immediate or in the near future. 

This will require a training set as well as a test set. The training set will be used to tune the classifier and the classifier will then be run on the test sample. If the features are independent of each other they can be used in a naiive Bayes classifier. 

Use Case 6) Find which customers will likely pay for premium access. 

In order to predict the likelihood that a user will become a paying customer, we collect information and put it in say a table with the attributes that we pick from server logs, such as location, read FAQ, pages viewed, and whether a premium service was chosen. 

Then we use decision trees to construct a tree from real data that is easy to understand where each node tells how important it is and how it will impact the outcome. 

Use case 7) Building price models for premium customers. 

We can build models that predict prices. We can price something manually and then by finding the items that are similar to the item that interests the customer, the algorithm can average their prices and make a guess at what the price should be for this item.

#codingexercise

Write your own pow(x,n) 

int pow(int x, int n)

{

if (n == 0) return 1;

if (n%2 == 0)

   return pow(x,n/2) * pow(x,n/2);

else

   return x*pow(x,n/2) * pow(x,n/2);

}
Find the median of two sorted arrays
int median (List<int> first)
{
int n = first.count;
assert(n> 0);
if (n %2 == 0)
return (first[n/2-1] + first[n/2])/2
else
return first[n/2];
}
int GetMedian(List<int> first, List<int> second)
{
int p = first.count;
int q = second.count;
if ( p == 0) return median(second);
if (q == 0) return median(first);
if (p == 1 && q == 1) return (first[0]+second[0])/2;
int m1 = median(first);
int m2 = median(second);
if ( m1 == m2) return m1;
if (m1 < m2)
{
if (p%2==0)
return GetMedian(first.Range(p/2-1, p/2+1), second);
return GetMedian(first.Range(p/2, p/2), second);
}
if (q%2==0)
return GetMedian(first, second.Range(q/2-1,q/2+1));
return GetMedian(first, second.Range(q/2,q/2));
}

Wednesday, May 11, 2016

Applications of Machine Learning for private cloud operations 

The following are some use cases for building smart web applications for a private cloud provider.  

Use case 1) Making recommendations 

Customers often have a variety of choices when they come to request resources from a private cloud. One way to help them would be to tell them what others like them have been leasing from the cloud provider. If we search a large group of people and smaller set with tastes similar to a customer in terms of what they have been liking or borrowing, then we can make a ranked list of suggestions by finding the people who are similar and combining their choices to make a list. 

Different people and their preferences can be represented as a nested dictionary. To find similarity among people, we could use some sort of similarity measures based on Euclidean distance or Pearson correlation. We can then score the items by producing a weighted score that ranks the participants. 

Use case 2) Discovering groups 

Customers can be pigeonholed by clustering and categorizing them based on their requests.  By determining the frequency of a certain type of resource in the requests made by the customers, we may be able to find customers who are similar in their needs or have similar planning. Such a result could be very useful in searching, cataloging and discovering the themes in the vast number of adhoc requests made over time. 

We can employ hierarchical clustering or K-means clustering to categorize the requests and relate personas to this shopping pattern. 

Use case 3) Searching and Ranking 

Searching and Ranking form the core of analytics for a customer when she wants to filter through support documents and literature. Very often customers will get a buzzword to search for and help themselves but they might not know the site layout or the categories to walk down to filter it. 

They can be helped with a page ranking that gives a weighted score based on the fraction of in-references to out references. Other ways of arranging them or finding results based on clustering can also help. 

#codingexercise 

Determine if three points are on a line

bool onSegment(Point p, Point q, Point r)

{

// q is on p r

if (q.x <= max(p.x,r.x)  && q.x >= min(p.x, r.x) &&

     q.y <= max(p.y, r.y) && q.y >= min(p.y, r.y))

{

 return true;

}

return false;

}

Determine if three points are oriented clock wise (1) or anti clockwise(2)

int orientation(Point p, Point q, Point r)

{

int val = (q.y-p.y)*(r.x-q.x) -

                (q.x-p.x)*(r.y-p.y);

if (val == 0) return 0;

return (val > 0) ? 1: 2;

}

Determine if two line segments intersect

bool intersect(Point p1, Point q1, Point p2, Point q2)

{

// (p1,q1,p2) and (p1,q1,q2) have different orientations
and (p2,q2,p1) and (p2,q2,p1) have different orientations
// and they are all collienar and one point lies on the segment of the other two

}

Find the squares of the sides using distances (x2-x1)^2+ (y2-y1)^2

Lets call them a-sq, b-sq and c-sq, 

then if two of the squares add up to the third - it is right triangle

if two of the squares are greater than the third no matter which two are picked it is acute
else obtuse.

Tuesday, May 10, 2016

Today we continue to read up on security zones. A security zone is nothing more than a network segment with protected ingress. We said that the external facing security zones are typically prevented from talking to one another. Each is a separate VLAN and no routing is allowed between them. These edge switches typically trunk to an L2 aggregation switch. Aggregating external traffic allows implementation of single point packet, session and network behaviour monitoring. The L2 switch trunks to a L3 Q Switch. An Intrusion Prevention system may be placed on this trunk. Q Switches provide dynamic port configuration. This allows a switch to configure port as an access port or a trunk port.They dynamic trunking protocol automatically creates trunk port between Q switches. A DTP can be compromised if the there are two VLAN tags in the packet. This concept is similar to VPN. When an attacker sends a packet with two VLAN tags, the first Q-switch strips the packet one tag and send it out to the other switch which treats it as a regular packet and forwards to all appropriate ports. The double tagging must be disabled to mitigate this. This is also another reason why don't have the edge switches talk to one another. A private VLAN extends this restriction further by configuring the ports in a way where none of the ports in the VLAN set can communicate with each other. For example, database servers are put in a private VLAN because they don't need to communicate with each other. Finally access is only one of the instruments for control. ACLs, separation of duties, least privilege and need to know are all applied to improve security.
Some examples of securing packet traffic with advanced techniques can be seen as in the case of IP Packet security with IPSEC protocols that authenticate and encrypt IP Packets. The IPSEC protocol uses the concept of security association in order to secure the IP. A security association is a bundle of algorithms and parameters (such as keys) that is used to authenticate and encrypt a particular flow in one direction. A flow is usually denoted as a five tuple <src address, src port, destination address, destination port, protocol> While TLS and SSH operate in the upper layers of the Application which involve data and transport, only IPSec works on Internet layer. The link layer is secured by 802.1x, Point to Point protocols PPP and Point to Point Tunneling protocols.

Is Partitioning a good idea ?
In computing industry, there is a widespread understanding that partitioning is best done only when it is absolutely necessary. In other words we don't involve any partitioning of resources because such planning doesn't always scale to future needs and often is more binding with lot more work to undo if business needs change.
take ulimits for example and almost everyone will understand it is more onerous than it does good. Some operating systems do not provide this option.
Take disk partitioning and we only find ourselves shooting in the foot when we partition up front and discover that we have improperly managed resources. A large unfragmented disk does not cause such symptoms.
Even Solaris introduced partitions for different flavors and soon applications would start failing in one partition because they didn't have enough resources on this as they earlier had on the undivided host.
Quotas often cause resource starvation and non-deterministic behavior of otherwise sound applications. Soft limits complicate the matter even further.
Containers are useful when they are elastic.

#coding exercise
Write a function that takes two parameters n and k and returns the binomial coefficient C(n, k)
int C(int n, int k)
{
if ( k == 0 || k == n) return 1;
return C(n-1,k-1) + C(n-1,k);
}

Given a boolean expression with the following symbols
'T' - true
'F' - false and logical operators
& - AND
| - OR
^ - XOR
Count the number of ways we can paranthesize the expression so that the value remains true.
We have to do this operator by operator basis:
Let T(i,j) represent the number of ways to paranthesize the symbols between i and j ( such that the subexpression between i and j evaluates to true
Let F(i,j) represent the number of ways to paranthesize the symbols between i and j ( both inclusive) such that the subexpression between i and j evaluates to false.

Total(i,j) = T(i,j) + F(i,j)
T(i,j) = sum k from I to j-1 [ T (i,k) × T (k+1,j) for the AND operation
Total(i,k) x Total (k+1,j) - F (i,k) × F (k+1, j) for the OR operation
T (i,k) × F (k+1, j) + F (i,k) x T (k+1, j) for the XOR operation

The reverse holds for F(i,j)

Monday, May 9, 2016

Today we discuss network segmentation from Tom Olzak's book : Enterprise security : a practitioner's guide.
The book argues that segmentation is important.Segmentation breaks a network into multi layer attack surface that hinders threat agents / actions that are both internal as well as external. Segmentation is not just about basic switch security but also how to control packets at the network and data link layer.
External threats can be mitigated by perimeter defenses which establish a trust boundary either on or outside the data center perimeter such as with a DMZ or SSL VPN appliance. Perimeter controls prevent unauthorized access to system attack surfaces. An authenticated user has unlimited access internally. But the perimeter defence does not always prevent unauthorized access. The flat data center network is one large broadcast domain. Any device sending an ARP broadcast looking for an IP address in the data center will receive a reply if the address is assigned to an active server. This lower layer penetration also poses a risk.
Another advantage of segmentation is protocol separation. IPX or AppleTalk can get their own VLANs. This limits traffic in each VLAN. Finally the use of VLANs enable secure, flexible user mobility. A user may be assigned to a specific VLAN and will connect to it regardless of whether he is coming in via the wireless network so long as the intermediary 802.1x is used for port authentication. The authentication server has the advantage that it can let the user groups be in active directory and assign the VLAN dynamically. If it were static, an attacker could simply plugin to the network.
VLANs are configured using Layer 2 switches.The medium access control (MAC) address has two parts - the first part being organizationally unique identifier (OUI) and the second being Network Interface Controller identifier (NIC). ARP broadcasts help resolve connectivity between devices but these increases traffic on the same network. Switches use content addressable memory (CAM) table to track MAC address / port pairs. From the update at the time of the first packet forwarding through the entry's aging period, the switch forwards all packets the same way. A VLAN is a set of switch ports. Devices connected to the same VLAN can talk to one another but are logically isolated from devices that are not connected to the same VLAN. When there are more than one switch in an organization, a trunk is configured between them using a port on each switch.
A datacenter may have one or more internal security zones. A security zone is merely a network segment with protected ingress.For example, one zone can provide a secure bridge between the internet and the data center. Usually external facing data zones do not communicate with one another. Essentially this aggregates all traffic to come through the same trunk. An intrusion prevention system (IPS) may be setup on this trunk.
#codingexercise
Convert a binary tree into doubly linked list in spiral order.
// same as level order but with odd levels having reversed listing
void spiralLevelOrder(Node* root)
{
if (root == null) return;
deque<Node*> q;
q.push_front(root);
stack<Node*> stk;
int level = 0;
while (!q.empty())
{
int nodecount = q.size();
if (level %2 != 0)
{
while (nodecount > 0)
{
Node* node = q.front();
q.pop_front();
stk.push(node);
if (node->left != null)
q.push_back(node->left);
if (node->right!= null)
q.push_back(node->right);
nodecount --;
}
}
else{
while (nodecount > 0)
{
Node* node = q.back();
q.pop_back();
stk.push(node);
if (node->left != null)
q.push_front(node_left);
if (node->right != null)
q.push_front(node_right);
nodecount--;
}
}
level++;
}
Node* head = null;
while (!stk.empty())
{
push (&head, stk.top());
stk.pop();
}
for(Node* cur = head; cur; cur = cur->next)
printf("%s", cur->data);
}

# print root to leaf paths in a binary tree
void PrintRootToLeaves(Node root)
{
if (root == null) return;
var stk = new Stack<Node>();
var parent = new Dictionary<Node, Node> ();
parent[root] = null;
stk.push(root);
while (stk.empty() == false)
{
var cur = stk.top()
stk.pop();
if (cur.left == null && cur.right == null) printStack( current, parent);
if (cur.right) {parent[current.right] = current; stk.push(cur.right)}
if (cur.right) {parent[current.left] = current; stk.push(cur.left)}
}
}

Sunday, May 8, 2016

#coding exercises
1) Find minimum number of merge operations to make a palindrome.
Ex: {15, 4, 15} => 0
{1, 4, 5, 1} => 1 because 4 and 5 are merged.
{11, 14, 15, 19} => 2 because all elements need to be merged.

int GetMergeCount(List<int> numbers)
{
int start = 0;
int end = numbers.Count;
while (start < end)
{
if (numbers[start] != numbers[end])
{
count += 1;
}
start++;
end--;
}
return count;
}

2) Print diagonal traversal of binary tree

void PrintDiagonal(Node root, int level. ref Dict<int,List<int>> levelValues)
{
if (root == null) return;
levelValues[level].Add(root.data);
PrintDiagonal(root.left, level+1, ref levelValues);
PrintDiagonal(root.right, level, ref levelValues);
}

void PrintDiagonal(ref Dict<int, List<int>> levelValues)
{
foreach (kvp in levelValues)
{
var level = kvp.Key;
console.WriteLine('Level='+level);
var items = kvp.Value;
for (int i = 0; i < items.Count; i++)
{
Console.Write(items[i].toString());
}
console.writeLine();
}
}

3) bool isPower(intN, int X)
{
return IsInteger(Math.Log(N)/Math.Log(x));
}

4) void Boggle(List<char> input, List<string> knownwords, ref List<char>candidate, int start, int level)
{
for (int I =start; I< input.Count; I++)
{
candidate[level] = input[I];
if (knownwords.Contains(candidate.toString()) console.writeline(candidate.toString());
if (i < input.Count)
{
Boggle(input, knownwords, ref candidate, start+1, level+1);
}
candidate[level] = '/0';
}
}

5) To determine if there is a backedge from any of the vertex to its ancestors in a graph other than the tree it is part of, we can maintain two arrays : a discovery time array which measures the time it takes to discover this vertex from root aka levels and a low value array that for a vertex u is the minimum of the discovery time for the vertex u and that of its ancestor w
low[u] = min(disc[u], disc[w])
if ( low[v] >= disc[u]) then there is a backedge.
Also note that by definition of a tree, every node has a parent p

6) void ConnectNodesAtSameLevel(Node root)
{
List<Node> levels = LevelWiseTraversal(root); // using a queue described earlier, append NULL for delimiters
for (int i = 0;i < levels.Count -1; i++)
if (levels[i])
levels[i].side = levels[i+1];
}

7) Print top view of a binary tree ( nodes that are on the periphery)

void PrintTopView(Node root)
{
// similar to level wise traversal
var ret = List<Node> ();
int left = 0;
int right = 0;
var q = new Queue();
q.enqueue(root);
root.dist = 0; // or use a hashtable
q.enqueue(null);
while (q.empty() == false)
{
var item = q.dequeue();
if (item == null){
q.enqueue(null);
}else
{
if (root.dist == 0 { ret.Add(root);}
if (root.dist < left ) { left = root.dist; ret.Add(root);}
if (root.dist > right) { right = root.dist;. ret.Add(root);}
root.left.dist = root.dist-1;
q.enqueue(root.left);
root.right.dist = root.dist+1;
q.enqueue(root.right);
}
}
ret.ForEach( x => console.write(x.toString()));
}