Cluster computing

Friday, August 19, 2016

We were discussing the flow networks and the Ford Fulkerson method. It introduced us to residual network, augmenting paths and cuts. These are used in the max-flow min-cut theorem which explains the maximum flow in terms of cuts
Residual network consists of edges with capacities that represent how we can change the flow on the edges of G. This is typically seen as the amount of flow that an edge can admit and is equal to the edge's capacity minus the flow on that edge. If that value is positive, then it is included into the residual network and are excluded otherwise. But it may also contain edges that are not in G. For example an edge whose flow has been decreased by admitting a counter flow from the opposite direction that can at most cancel out the flow is also part of residual network.
These reverse edges in the residual network allow an algorithm to send back flow that it has already sent on an edge.
The residual capacity cf(u,v) is therefore defined as
cf(u,v) = c(u,v) - f(u,v) if (u,v) belongs to E
            = f(u,v)              if (v,u) belongs to E
             = 0                    otherwise
The above equation covers all the three cases of admitting a positive, negative or unchanged flow in terms of capacity.
Each edge of a residual network is a residual edge.
Augmentation of a flow by f to f' is defined as the previous flow f + the increase in the flow in the forward direction - the decreased flow in the reverse direction. If the augmentation does not happen, the flow is unchanged.
Some of the lemmas associated with the maximum flow algorithm include the following:
Let G = (V,E) be a flow network with source s and sink t, and let f be a flow in G. Let Gf be the residual network of G induced by f and let f' be a flow in Gf. Then the augmentation is defined as a flow in G with value equal to the individual magnitude of the two flows.
Another lemma is that if f is a flow in G and and p is an augmenting path
then the flow along the path between u and v in the residual network is the residual capacity of the path if (u,v) is on the path or zero otherwise. This binary definition of the flow along the path suggests that either and edge is counted or it isn't and when it is, the residual capacity of the path and not just between (u,v) determines the flow in the residual network.
#puzzles
Two children - a boy and a girl play in the mud. Their mother states at least one of them has a muddy head. Then she asks them if they know whether they have a muddy head. What will the children answer if they speak the truth.
The children will say No first and Yes afterwards collectively.
Let s be the statement that the son has a muddy head
Let d be the statement that the daughter has a muddy head
The mother statement is true but the kids don't see their own head so they don't know if their head is muddy and answer no.
subsequently, each one has an additional information from their response given earlier and therefore know their state. They would have known their respective state even if both of them answered yes in the first place.
#codingquestion
Given n ropes of different length, combine them into a single rope,such that total cost is minimum. You can tie two ropes at a time,and cost of tying is sum of length of ropes.
Int getmin (list <int> Len)
{
Int cost = 0
Heap h = build_heap(len)
While (h.size () >1)
{
First = extract_min (h)
Second = extract_min (h)
Cost += first + second
H.insert (first+second );
}
Return cost;
}
bool hasPythagoreanSum(List<int> sorted)
{
    assert(sorted.all(x=>x> 0));
    sorted.sort();
    Int len = sorted.count;
    For (int i=0; i<len; i++)
    {
          Int start = I + 1, end = len-1;
          While (start<end)
          {
            int sum = sorted[i] ^ 2 + sorted[start] ^2;
            int target = sorted[end] ^ 2;
            If (sum == target)
            {
                return true;
            }
            If (sum < target)
            {
                     start++;
            }
           Else
           {
               end--;
           }
      }
}
   Return false;
}

Find excel column name from given number for example
26 Z
52 AZ
702 ZZ
String getColName(int n)
{
Stringbuilder ret;
Int i = 0;

While(n)
{
Int rem = n%26;
If (rem==0)
{
Ret[i++] = 'Z';
N = n/26 -1;
}
Else{
Ret[i++] = 'A' +(rem-1);
N = n/26;
}
}
Ret += '/0';
Ret.reverse();
Return ret.toString();
}
This could also be done as pre compitations from powers of 26
For example 26 power 1 is Z
26 power 2 exceeds 52.
So 52 must be some multiple of 26 and offset from the previous computation which is Z. Since (52 -26) / 26 = 1, we have AZ and so on.

Thursday, August 18, 2016

Today we continue to discuss flow networks and the max flow problem. In particular, we discuss the Ford Fulkerson method. In the maximum-flow problem, we are given a flow network G
with source s and sink t , and we wish to find a flow of maximum value. A maximum flow problem may have several sources and sinks but these are not treated any different from a single source and sink.
The Ford-Fulkerson method depends on three important ideas that transcend the method and are relevant to many flow algorithms and problems: residual networks, augmenting paths, and
cuts. This method iteratively increases the value of the flow. We start with f (u,v) = 0 for all u,v belonging to V and giving an initial flow of value 0. At each
iteration, the flow value in G is increased by finding an “augmenting path” in an
associated “residual network” Gf . Once the edges of an augmenting
path in Gf is known , specific edges in G can be identified for which the flow can be changed so as to increase the value of the flow. Although each iteration of the
Ford-Fulkerson method increases the value of the flow, the flow
on any particular edge of G may increase or decrease; decreasing the flow on some
edges may be necessary in order to enable an algorithm to send more flow from the
source to the sink. The flow is augmented repeatedly until the residual network has
no more augmenting paths. The max-flow min-cut theorem will show that upon
termination, this process yields a maximum flow.
FORD-FULKERSON-METHOD(G,s,t)
initialize flow f to 0
while there exists an augmenting path p in the residual network Gf
augment flow f along p
return f

# puzzle
It’s summer time and three of the college friends, A, B and C went to sports club to enjoy the vacation. They saw an interesting game being played there.
Game name is “hit and win”. There was a triangular setup of green ropes on ground. You have to move on the triangular path and on the turn you have to hit other mate using paper ball until only one is left for winning.
A’s chances of hitting is 0.3.
B has hitting chance of 0.5 .
C has 100% chance of hitting the target.
If a person is hit, he is out from path and cannot take a turn.
Suppose order of throwing balls is A, B, C. You being close to A, what strategy would you suggest A for having maximum chance of winning.

The order is A, B, C
if A misses B and C eliminate either of them and A has a chance to eliminate the remaining.

#coding interview
Find distance between two nodes in a binary tree
Distance between nodes can be found out this way:
dist(n1, n2) = dist(n1, root) + dist(n2, root) - 2 * dist(root, lca)

int getDistance(node root, int n1, int n2)
{
if (root == null) return;
int d1 = INT_MAX;
int d2 = INT_MAX;
int distance = 0;
int level = 1;
node lca = getlca(root, n1, n2, ref d1, ref d2, ref distance, level);
if (d1 != INT_MAX && d2 != INT_MAX)
return distance;
if (d1 != INT_MAX) // n1 is ancestor of n2
{
distance = getlevel(lca, n2);
return distance
}
if (d2 != INT_MAX) // n2 is ancestor of n1
{
distance = getlevel(lca, n1);
return distance;
}
return INT_MAX;
}

node getlca(node root, int n1, int n2, ref int d1, ref int d2, ref distance, int level)
{
if (root == null) return null;
if (root.data == n1){
d1 = level;
return root;
}
if (root.data == n2){
d2 = level;
return root;
}
var leftlca = getlca(root.left, n1, n2, ref d1, ref d2, ref distance, level+1);
var rightlca = getlca(root.right, n1, n2, ref d1, ref d2, ref distance, level+1);
if (leftlca && rightlca)
{
distance = d1 + d2 - 2 * level;
return root;
}
if (leftlca != null){
return leftlca;
}else{
return rightlca;
}
}

Wednesday, August 17, 2016

We were discussing skip graphs yesterday. Graphs also come useful in visualizing flow networks
Flow networks can model many problems, including liquids through pipes, parts through assembly lines and information through communication networks.
Each directed edge can be considered a conduit with a stated capacity. Vertices are conduit junctions. One may be a source another may be a sink. The total amount of flow entering a vertex must be the same as the outgoing. This is called flow conservation.
One of the most talked about problems with flow networks is to maximize the flow where we compute the greatest rate at which we can ship material from source to sink without violating any capacity constraints.
Denoting the flow f as a non-negative quantity f(u,v) from vertex u to vertex v. The absolute value of f is defined as the sum of flows f(s,v) - the sum of flows f(v,s). The second component involving the sum f(v,s) is usually zero.
Generally for a flow network, the edges are directed. If an edge (v1,v2) belongs to the set of edges E, then edge (v2, v1) should not belong to the edges E. If they exist they are called antiparallel edges. If a flow network has antiparallel edges, it is transformed to one without any.
A maximum flow problem may have several sources and sinks. Fortunately, this is no harder than an ordinary maximum flow problem with a single source and a sink. We just need to add an imaginary supersource and an imaginary supersink and directed edges with infinite capacity to each of the source and from each of the sink.

#puzzle
A king divides his fifteen elephants among his three sons such that the first gets half, the send gets three quarters of remaining and the third gets half of remaining. How much does each get?

Let us add an imaginary elephant. Then the first gets 8, the second gets 6 and the last gets one leaving the imaginary one aside.

#codingexercise
Add all the greater values to every node in a BST

void AddGreater(node root, ref int greater)
{
if (root == null) return;
AddGreater(root.right);
greater += root.data;
root.data = greater;
AddGreater(root.right, greater);
}

# Given a histogram in terms of a list of bar heights of unit width, find the maximum water that can be retained between the bars
int GetRetention(List<int>h, int n)
{
var left = new int[n]{};
var right = new int[n]{};
int held = 0;

left[0] = h[0];
for (int i = 1; i < n; i++)
left [i] = max(left[i-1], h[i]);

right[n-1] = h[n-1];
for (int i = n-2; i >= 0; i--)
right[i] = max(right[i+1], h[i]);

for (int i =0; i < n; i++)
held += min(left[i], right[i]) - h[i];

return held;
}

We were discussing skip graphs yesterday. Graphs also come useful in visualizing flow networks
Flow networks can model many problems, including liquids through pipes, parts through assembly lines and information through communication networks.
Each directed edge can be considered a conduit with a stated capacity. Vertices are conduit junctions. One may be a source another may be a sink. The total amount of flow entering a vertex must be the same as the outgoing. This is called flow conservation.
One of the most talked about problems with flow networks is to maximize the flow where we compute the greatest rate at which we can ship material from source to sink without violating any capacity constraints.
Denoting the flow f as a non-negative quantity f(u,v) from vertex u to vertex v. The absolute value of f is defined as the sum of flows f(s,v) - the sum of flows f(v,s). The second component involving the sum f(v,s) is usually zero.
Generally for a flow network, the edges are directed. If an edge (v1,v2) belongs to the set of edges E, then edge (v2, v1) should not belong to the edges E. If they exist they are called antiparallel edges. If a flow network has antiparallel edges, it is transformed to one without any.

#puzzle
A king divides his fifteen elephants among his three sons such that the first gets half, the send gets three quarters of remaining and the third gets half of remaining. How much does each get?

Let us add an imaginary elephant. Then the first gets 8, the second gets 6 and the last gets one leaving the imaginary one aside.

#codingexercise
Add all the greater values to every node in a BST

void AddGreater(node root, ref int greater)
{
if (root == null) return;
AddGreater(root.right);
greater += root.data;
root.data = greater;
AddGreater(root.right, greater);
}

# Given a histogram in terms of a list of bar heights of unit width, find the maximum water that can be retained between the bars
int GetRetention(List<int>h, int n)
{
var left = new int[n]{};
var right = new int[n]{};
int held = 0;

left[0] = h[0];
for (int i = 1; i < n; i++)
left [i] = max(left[i-1], h[i]);

right[0] = h[n-1];
for (int i = 1; i < n; i++)
right[i] = max(right[i-1], h[i]);

for (int i =0; i < n; i++)
held += min(left[i], right[i]) - h[i];

return held;
}

Tuesday, August 16, 2016

Skip Graphs

Conventional Skip graphs are based on skip-lists that provide the additional benefits of using a balanced tree. Thus they improve upon searching peer-to-peer networks and by providing the ability to perform queries based on key-ordering as opposed to the hash table based functionality.

Conventional Skip graphs also introduce new algorithms for searching, constructing and inserting into the new graph. However, reducing the graph to fewer nodes helps us provide approximate answers faster while retaining the original algorithms. Moreover they can be used in conjunction with the original graph because the simpler representation is logically overlayed on top of the original graph.

With fewer nodes and aggregated statistics per node, it is easier to answer questions about how many nodes satisfy a given criteria especially when we want to return a good short ranked list and not require the correctness of the entire result set. This is often the case for similarity based measures.

A simpler graph that has fewer nodes than the original will also fit in memory. This makes processing faster. We can select those nodes that have higher centrality. Then we can add links that represent an aggregated link between these higher order nodes. A query involving a simple where clause as in a SQL statement will yield results that are expectedly more important nodes than all the others. This is useful in case where we want to find connectivity. Similarity measures can rely on connectivity distances between two nodes. The higher centrality nodes may have to partition the graphs based on centrality to aid with the connection information. A simple graph with fewer nodes is a new graph of existing nodes and has no overlap with the existing original graph. It answers some of the queries directly with the nodes and edges in the graph using traditional graph algorithms.

A skip list is a tower of increasingly sparse linked lists sorted by key. The lists at the higher level act as express lanes that allow the sequence of nodes to be traversed quickly. A skip graph, as per James Aspnes and Gauri Shah, is distinguished from a skip list in that there may be many lists at a particular level and every node participates in one of these lists until the nodes are splintered into singletons. A skip graph is equivalent to a collection of upto n skip lists that happen to share some of their lower levels. In distributed computing, the graph has to tolerate arbitrary node failures and it should be usable even when a huge fraction of the nodes fail. Therefore each node that appears in higher level skip lists occur based on some membership vector generally using prefix memberships to a word. This form of membership makes a skip graph a trie of skip lists. A membership vector is a random variable. In other words, there is an edge between x and y whenever x and y are adjacent in some list.

We can also have a tower of centrality levels and where the lowest level represents the original graph that may be very large in size and the topmost consisting of far less number of nodes. The nodes of higher levels recur in lower levels.

The search operation in a conventional skip graph starts at the top most level of the node seeking a key and proceeds along the same level without overshooting the key, continuing at a lower level if required until it reaches the lowest level where the key will be found. The same procedure holds for the proposed skip graph because the edges appear at a current level if there was connectivity at underlying levels and we proceed at lower level when we overshoot the preceding level.

Skip graphs can also support range queries because now we have to find only one of the nodes in the interval and then broadcast throughout the m nodes in the interval by flooding. This is a technique used in conventional skip graphs

The range query algorithm is somewhat like this

Void findOne(Node n, int searchOp, node startNode, int searchKey, int level)

{

If (n.key == searchKey) then

Inform n to start_node

If n.key < searchKey then

While level >= 0 do

If (nRlevel).key < searchKey then

Recursively call at nRlevel

Break;

Else level = level –1

Else

While level >= 0 do

If (nLlevel).key > searchKey then

Recursively call at nLlevel

Break;

Else level = level –1

If (level < 0) then

Inform n to startNode

}

Void findallInrange(ref List<Node> result, int min, int max)

{

// all the nodes at the lowest level between min and max are the candidates

FindOne say at min

Then binary chop between min and max at lowest level.

}

The nodes in a conventional skip graph occur at least once in one of the skip lists at every level. The nodes in the proposed graph occur selectively at different levels but maintain connectivity information as aggregation of underlying lower level connections. This is ideal for graphs that don't change much over time or where changes are incremental, possibly in a time series database or data warehouse.

In addition, we can also have a full set of nodes at every level although the nodes appear in different skip lists at each level. This gives the advantage that the connectivity need not be aggregated to higher levels but the memberships are based on increasing path lengths. This adds more information than the proposed graph. Redundancy is not a problem as storage is cheap more so than ever.

Skip graphs are effective on time series data such as in warehouses. The data gets accumulated over time in an incremental manner and there is little or no change to already accumulated data.

#codingexercise

Get the maximum difference between a node and its ancestor in a binary tree:

int getMaxDiff(node root, ref int diff)

{

If (root == null)

Return INT_MAX;

Int left = GetMaxDiff(root.left, ref diff);

Int right = GetMaxDiff(root.right, ref diff);

Int val = min(left, right);

Diff = max( diff, root.data – val);

Return min(val, root.data);

}

Get the lowest common ancestor of nodes having values n1 and n2 in a binary tree:

Node getLCA( node root, int n1, int n2)

{

If (root == null) return null;

If (root.data == n1 || root.data == n2) return root;

Int left = getLCA(root.left, n1, n2);

Int right = getLCA(root.right, n1, n2);

If (left && right) return root;

If (left) return left;

If (right)return right;

Return null;

}

Monday, August 15, 2016

Steps to join a Linux computer to Active Directory using AD are as follows:

First, the benefits of using SSSD:

· Reduced load on authentication servers.
· Option for offline authentication
· Single user account

Steps:

1) Make sure that both the Active Directory and Linux systems have a properly configured environment

2) On the Linux client, add the Active Directory domain to the client's DNS configuration so that it can resolve the domain's SRV records.

Search adserver.example.com

Nameserver 192.168.1.1

3) Set up the linux system as an AD client and enroll it within the AD domain.

1) set up the kerberos to use AD realm

1) vim /etc/krb5.conf

2) configure the logging and libdefaulrs section

[Logging]

FILE: /var/logs/krb5libs.log

[Libdefaults]

Default_realm=example.com

Dns_lookup_realm=true

Dns_lookup_kdc = true

Ticket_lifetime=24h

Renew_lifetime=7d

Rdns = false

Forwardable = yes

2) Configure the samba server to connect to AD server

1) vim /etc/samba/smb.conf

2) configure [globals]

[global]

workgroup = EXAMPLE

client signing = yes

client use spnego = yes

kerberos method = secrets and

keytab log file = /var/log/samba/%m.log password server = AD.EXAMPLE.COM

realm = EXAMPLE.COM

security = ads

3) add the linux machine to the AD domain.

Kinit Adninistrator

Net ads join -k

Klist -k

4) configure sssd

[sssd]

config_file_version = 2

domains = ad.example.com

services = nss, pam, pac

Create a new domain section at the bottom of the file for the Active Directory domain. This section has the format domain/NAME, such as domain/ad.example.com. For each provider, set the value to ad, and give the connection information for the specific Active Directory instance to connect to.

[domain/ad.example.com]

id_provider = ad

auth_provider = ad

chpass_provider = ad

access_provider = ad

#codingexercise

Find the maximum width of a binary tree where the width is the count of nodes at a given level

int GetMaxWidth(Node root)

{

int max = 0;

for (int i =l; i < height(root); i++){

int count = GetWidth(root, i);

if (count > max)

max = count;

}

return root;

}

int GetWidth(Node root, int level)

{

if (root == null) return 0;

if (level == 1) return 1;

if (level > 1){

return GetWidth(root.left, level -1) + GetWidth(root.right, level-1);

}

return 0;

}

#monty hall problem

A tv show host gives you the option to open one of three doors behind one of which is a car and the others have goats. You pick a door and the tv show host picks another. His door has goat. Do you want to switch doors ?

Solution to the above problem is that you should switch because the probability that the second door hides the car has increased to 2/3.

Previously, the two remaining doors haf a combined probability of 2/3 but with the hosts choice, that probability has now become entirely that of one unopened door to switch to

An employee works for an employer for 7 days. The employer has a gold rod of 7 units. How does the employer pays to the employee so that the employee gets 1 unit at the end of everyday.The employer can make at most 2 cuts in rod.

Solution : the rod should be cut in 4,2,1 length pieces.

A box contains n coins, of which 7 of them are counterfeit with tails on both sides and the rest are fair coins. If one coin is selected from the bag and tossed, the probability of getting a tail is 17/20. Find the value of ‘n’.

N = 2. Working backwards from the last pèrson.

7/n x 1 + (n-7)/n x 1/2 = 17/20

N = 10

A boy goes to 20 of his friend’s houses with ‘n’ number of newly purchased marbles in his hands. At every house he visits , he gives away half of marbles he have and take one of his friend’s marble’s and adds it with the one’s he is left with , he never had a problem of dividing an odd number of marbles left and finally after leaving the his 20th friends house, he is left with 2 marbles, can you guess the ‘n’ value?