Cluster computing

Friday, October 30, 2015

A diagonal of a regular 2006-gon is called odd if its endpoints divide the boundary into
two parts, each composed of an odd number of sides. Sides are also regarded as odd diagonals.
Suppose the 2006-gon has been dissected into triangles by 2003 non-intersecting diagonals.
Find the maximum possible number of isosceles triangles with two odd sides.

An isosceles triangle is called odd if it has two odd sides. A triangle in the dissection above which is odd and isosceles is called iso-odd.
We make an observation as follows:
Let AB be one of the dissecting diagonals and let Lambda be one of the shorter part of the boundary of the 2006-gon with endpoints A, B. Suppose that lambda consists of n segments. Then the number of iso-odd triangles with vertices on Lambda does not exceed n/2.
We can prove this by induction . For the initial case of n = 2, there can only be one such triangle. So this holds. Let us assume this observation holds for all lambda of length < n. Then we consider the case when Lambda == n. Let PQ be the longest diagonal which is a side of an iso-odd triangle PQS with all vertices on Lamda. If there is no such triangle, then we are not increasing the number of iso-odd triangles and hence the observation holds. If on the other hand, there is such a triangle, then every triangle whose vertices lie on lambda is right angled or obtuse with the maximum possible when it form the triangle PQS with the summit at S. Then we may assume that the five points APSQB lie on lambda and partition lambda into four pieces of AP, PS, SQ and QB each where possibly the outer two segments reduce to a point. Since PQ is the longest diagonal and PQS is an iso-odd triangles, it cannot have vertices on both AP and QB the outer segments. Therefore every iso -odd triangle lying within lambda has its vertices on just one of four pieces.Each of these pieces we can apply the same induction and we can add the four inequalities,we get the required number to not exceed n/2.

The intuition behind this approach is that there are only three vertices we need to pick to for such triangles and given that they can lie arbitrarily along the segments we categorize them into forms that the triangle can take and where those vertices may fall. Once we have a lemma for the number of such triangles, we can easily study it in each of the categories

Thursday, October 29, 2015

Today we review the book The 5 Choices by Kory Kogon, Adam Merrill, and Leena Rinne.
This book is about the path to extraordinary productivity when the workplace is getting a whole lot overwhelming and exhausting. Texts, Emails, interruptions meetings etc inundate us and distract us from thinking clearly to make good decisions and accomplish what matters. Therefore this book shows how to manage time and energy through five fundamental choices.
The book promises to increase our productivity at the individual team and organization levels and to feel engaged, accomplished and satisfied.
The book shares new ideas on how to be productive, how to get clear and focus on the things that matter to us, how to increase our capability to make decisions and how to recognize that you have the ability to do extraordinary work. The Five choices are enumerated as follows:
Choice 1 is to act on the important and not to react to the urgent. The authors say that we either react or we think. The reacting is the lower level where we process our feelings and emotions for a response but don't necessarily think it out. Thinking on the other hand is the upper level where we make conscious and intentional decisions. The longer we use the upper plane the better the outcome. We have some tools to do this. FranklinCovey's Time Matrix provides the framework and Pause-Clarify-Decide provides the process. The time matrix is a four quadrant visualizing tool that helps us categorize activities. The top left quadrant is the quadrant of necessity and contains things that are both urgent and important. The lower left quadrant is the quadrant of distraction. This is the one that gives a negative return on your attention and energy. The lower right quadrant is the quadrant of waste. We should try not to be here at all. The upper right quadrant is the one for extraordinary productivity. To spend more time in this quadrant, we should pause to think what item belongs in which quadrant and once its clear, we decide accordingly.
Choice 2 is to go for the extraordinary and not to settle for the ordinary. For each item that we place in the quadrant 2, we can attach a statement or story as we do on our scrum meetings. We anchor it to a purpose and passion. It helps us to tap into our motivation and higher performance.
Choice 3 is to schedule the big rocks and not to sort gravel. Quantifying our work items is akin to attaching story points to our stories. Consequently, we don't spend time on the little things that clutter up and prevent us from executing on the big items. A master task list is probably one of the best visualizing tools for this Since the list is the master and it contains items that we want to execute, we have all the items in our consideration and ranked appropriately.
Choice 4 is to rule the technology and not let it rule us. Very often we spend more time learning to use a tool or to work with a tool with the assumption that the tool will save us, but as a Japanese swordsman once said being independent of your tools and being present is the first step. You can manage the information you need to process by organizing them into appointments, tasks, contacts and notes/documents.
Choice 4 also includes a set of master moves that make our execution efficient. The first master move is to win without fighting Here we automate much of our decision making process. The second master move is to eliminate any additional quadrant 3 and 4 items from our inbox. The third master move is to enable more preparedness by including a link to locate so we spend hardly any time in Q1.
Choice 5 is to fuel our fire and not to burn it out. The five drivers of mental and physical energy are Moving,eating, sleeping, relaxing and connecting. Each of them is powerful in making us more productive.

Wednesday, October 28, 2015

Q: An equilateral triangle, partitioned into n2 congruent equilateral triangles by n-1 equidistant parallels to each of its three sides. Two chess-like bishops placed at any two vertices of the small triangles are said to menace one another if they lie on a same parallel. The problem is to determine the largest number of bishops that can be placed so that none menaces another.

A: A bishop may be assigned three coordinates a; b; c, namely the numbers of sides of small triangles they are off each of the sides of the big triangle.

In other words the condition above is that if i != j; then ai != aj ; bi != bj and ci != cj .
If any pair of these axes coordinates were same, then the bishops would be on the same triangle. If they are on the same triangle, they share the same parallel that made the triangle and hence would menace each other.

In fact the only solution that divides the number of congruent triangles is when the smaller triangle formed by the bishops is itself an equilateral triangle.
The first condition gives us the inequality that pairwise parallel-coordinates are distinct so we have their sum as N(N-1)/2
summing all the three co-ordiantes we have 3N(N-1)/2 <= sum of all( ai+bi+ci)
Since all the triangles are congruent,N = 4
therefore a + b + c = n

By solving this problem, we have the obtained the first condition as in the problem earlier. And again showed that the limiting case of equality is also included as a solution.

#codingexercise
Determine the type of a triangle given its edge lengths
Void printTriangle( int a, int b, int c)
{
If (a == b && b==c) printf ("equilateral");
If (a ==b || b == c || c == a) printf ("isosceles");
printf ("scalene");
}

Tuesday, October 27, 2015

#coding question.

Matrix has 1's followed by 0's. Find the row with the maximum 1

int GetRowWithMost1(int[,] matrix, int row, int column)
{
int maxCount = 0;
int ret = 0;
for (int I =0; I < row;i++)
{
int count = 0;
for (int j = 0; j < col; j++)
{
   if (matrix[I,j] == 1) count++;
}
if (count > maxCount){
    maxCount = count;
    ret = I;
}
}
return ret;
}

Q: For any integer n >= 2, let N(n) be the maximal number of triples (ai, bi, ci), i = 1, ...., N(n), consisting of nonnegative integers ai; bi and ci such that the following two conditions are satisfied:

(1) ai + bi + ci = n for all i = 1, ...., N(n);

(2) If i != j; then ai != aj ; bi != bj and ci != cj .

Determine N(n) for all n >= 2.

A: The second condition above suggests that a-coordinates in all the triples are pairwise distinct.
If we sum the a-coordinates we get a value that has to be greater than or equal to the sum of the suffixes i-1 from 1 to N. This is because ai has to be less than n and and i can only range from 1 to N(n)
Therefore the sum of a-coordinates has to be greater than or equal to the sum of consecutive numbers which is equal to N(N-1)/2
This same inequality holds for all the coordinates. Consequently, we have 3N(N-1)/2 <= Sum of all coordinates from 1 to N = nN
Therefore 3(N-1)/2 <= n
and N has to be less than or equal to the floor of (2n/3) + 1
If we take different examples we can see that N can be equal to the floor of (2n/3) + 1 and this is the value we are required to determine.

There are three different cases for n that we can take. These are
n=3k-1
n=3k
n=3k+1
and for each we can draw the examples up to floor(2n/3) + 1 and we will see that the limiting case of floor(2n/3) + 1 also holds the above two conditions as given. Hence the determination of N(n) is correct.

Monday, October 26, 2015

 A critique of the paper “Rule Profiling for Query Optimizers and their implications”  

This paper introduces a technique for evaluating the transformation rules that drive query optimization in a relational database server such as Microsoft SQL Server.  It studies the usage of transformation rules by the query optimizer with the help of profiling tools. Profiling in this case is explained as finding the set of rules that are needed for a particular query to obtain an optimal plan.  This is referred to as an essential set of rules for a particular query.  There may be more than one essential set for a query if there are different ways to generate an optimal plan for the query. Consequently there may be some overlap of rules within the query. A rule that is included in every essential set for a query is deemed a relevant transformation rule for that query. The paper empirically studies the metrics for a variety of workloads. Plan and cost for an execution plan depends on what is chosen by the optimizer and are considered functions of the input query. 

While this paper succeeds in articulating metrics that yield fewer transformation rules for the purpose of studying different workloads by the same query optimizer, the paper could have considered an alternate metric such as precision and recall.  Precision in this case is the ratio that explains number of selected items that are relevant. It is the ration of the true positives to all that were selected by the query optimizer for this query. A true positive is one that improves the query execution plan A false positive doesn’t but shows up in the query plan.  Recall on the other hand is a metric that determines how many relevant items are selected. It is the ratio of the true positives to all those that would have improved the query execution plan from the global set of transformation rules including such that the optimizer did not select. Together precision and recall yields a different metric called F-score which gives the effectiveness of retrieval with respect to a given query. 

The advantages of the metrics from the paper are not lost with the advantages of the proposed metric. On the other hand, it comes closer to the evaluation the paper sought out earlier. The paper suggests that there are less than twenty transformation rules that fall under its “relevant” definition The inclusion of such rule in every execution set merely strengthens the criteria that it is “relevant” This equates to identifying the common rules in the “true positives” of every essential set but their inclusion alone is not a guarantee that they are indeed improving the plan by bringing the cost down At the same time the demarcation of true versus false positives is much more cleaner in what the optimizer has selected and is indeed-relevant to the optimization of the plan. The essential set and the combination of true and false positives are similar in nature although it can be argued that without any comparision across optimization plans for the same query, the true and false positives give a better indicator within the same chosen set for an execution plan. Furthermore, while it can be argued that most if not all the transformation rules may be true positives for a highly effective query optimizer, the differentiation with another choice of query plan cannot really be as well quantified by the “relevant” definition of the paper alone as with the precision and recall. Even the differentiation in terms of the false positives is brought to better light with the use of precision and recall. 

 #coding exercise

Check whether a given linked list represents a palindrome.

The challenge here is sequential acess of elements for comparison rather than the indexed base access.

Solution 1

Determine length of linked list in first pass.

Traverse exactly half and then check the latter half to be the reverse of the preceding half.

Linear time complexity.

Sunday, October 25, 2015

Mobile databases

Handphone and mobile devices are becoming ubiquitous and more and more powerful. Even their software can support a large ecosystem of applications. That is where the applications vie for the limited resources on the mobile devices. As a text book says, some of the software problems which involve data management, transaction management and database recovery have their origin in distributed data systems and in mobile computing these problems become more difficult to solve –mainly because of the nature of bandwidth for the wireless communication channels, relatively short active life of the power supply and the changing locations of the required information (sometimes in cache, sometimes in the air, sometimes at the server)

The mobile computing architecture is also slightly different. It is one where mobile units and base stations communicate through wireless channels having bandwidths significantly lower than those of a wired network. The mobile units operate within geographic mobility domain that is divided into smaller domains called cells. The mobile discipline requires that the movement of mobile units be unrestricted within the geographic mobility domain, while having information access contiguity during movement. The average duration of a user’s stay in the cell, referred to as the residence latency, is a parameter that is computed and continuously adjusted.

In this world, the mobile units suffer from the drawbacks that the database connectivity may be intermittent and may have limited power source. Clients slip into doze mode to conserve energy. They can be woken up when the server wants to communicate or the user keeps it active.

Moreover the type of data and data access is different for mobile applications. Data may be classified into 1) private data – one that is restricted to the user, 2) public data – one that can be accessed by anyone who can read it and 3) shared data – data that is accessed both in read and write modes by groups of users. Mobile databases can be distributed in one of two ways: The entire database may be distributed mainly among the wired components, possibly with full or partial replication. The database is distributed among wired and wireless components. If the database is local to the computer, the database may need synchronized with the remote. This environment, referred to as the intermittently synchronized database environment have the following characteristics: The communication between client and server is unicast or multicast. A server cannot connect to a client at will. Flakiness of the connection is immaterial. A client is free to manage its own data and transactions. A client has multiple ways of connecting to the server.

While today’s application and even mobile database providers are increasingly relying on wifi hot spots where internet ubiquity and connection speed and bandwidth have given rise to near wired connection environment, the classical problem of mobile databases has potential for untapped market of non-wifi coverages.

Snapshots can provide the last saved version enabling both recovery and file synchronization services. Since snapshots can be taken at any granularity, the low disk and memory footprint for mobile device usage can be maintained. This kind of service becomes external to the database.

#codingexercise

If you are given a number find if the number is colorful. A number is colorful when all the combinations of its digits result in a different product when multiplied.
Solution first we combine, then we find the duplicates in the products resulting from the combinations.
void Combine(ref string digits, ref stringbuilder b, int start, int level, ref List<int>Products)
{
for (int I =start; I < digits.length; I++)
{

if (b[level] != digits[I]){
     b[level] = digits[i];
     Products.add(calculateProduct(b.toString()));
}

if (I < digits.length)
    Combine(ref digits, ref b, start+1, level+1);

if (b[level] == digits[I]){
     b[level] = '/0';
}
}
}

void isColorful(List<int>Products)
{
return (Products.Count == Products.distinct().count());
}

Observation the given sequence should not have the digit 1.

Observation the given sequence should not have duplicates.

Saturday, October 24, 2015

#coding interview question

Given an unsorted array in which every adjacent element is differ by either +1 or -1

Write an efficient code to find x where x is also given.

Solution 1:

Binary Chop the elements till we encounter either x-1, x, or x+1 then search nearby for x.

Solution 2:

(approach when many x are involved)

Perform an insertion sort followed by a binary search.

Insertion sort is usually O(n^2) but in this case the positions are off by +-1 so we have to look at cells that either contain x-1, x, x+1 and how many ever duplicates there may be.

void InsertionSort(int A[], int length, int x){

for (int i =0; i < length; i++){

int j = i;

while (j > 0 && A[j-1] > A[j]){

// Swap(A[j],A[j-1]);
int temp = A[j];
A[j] = A[j-1];
A[j-1] = temp;

j = j - 1;

}

}

int Search(int A[]. int length, int x){

InsertionnSort(A, length, x);
return exists(A, length, x, 0, length-1);
}