Cluster computing

Wednesday, November 4, 2015

Today we continue discussing Big Data.
We compare the following NoSQL databases :
Cassandra: This has an Apache license and is written in Java. It claims to store huge datasets with a query language that is similar to SQL except that it has no joins and no aggregate functions. The queries are done by the keys or key ranges and there is tunable parameter for determining the tradeoffs between distribution and replication. Data can have expiration set on insert. Writes can be much faster than reads. It has very good and reliable cross center data replication. It enables Map/Reduce with Apache Hadoop
MongoDB: This is written entirely in C++. It claims to retain query and indexing properties from SQL It has an AGPL license. It enables Master or Slave replication and auto failover with replica sets. Sharding is built-in. It uses a custom binary protocol called BSON and queries are written as Javascript expressions. These are executed server side. It runs arbitrary javascript functions on server-side.It has a better update in place than CouchDB. It uses memory -mapped files for data storage. Text Search is integrated. This is a convenient transition from SQL to NoSQL where we want to move away from schema and still retain the indexing and querying
ElasticSearch: this has an advanced search capability. It stores json documents and versions them. It can maintain hierarchy between documents. It enables stat-groups that help with debugging. It is querying is rich and powerful and moreover it is script able. It sorts by score and enables fuzzy or approximate searches.
CouchDB is a classic document and big table store. It is written in Erlang and claims to have DB consistency. It has Apache license. It uses HTTP/REST protocol. It supports bidirectional replication in both continuous and adhoc mode, with conflict resolution and hence master to master replication. Because it supports multiversion concurrency document, it supports older versions of documents. It has embedded map/reduce and shares updates in real time with _changes.
Courtesy:kkovacs.eu

Tuesday, November 3, 2015

A comparision of SQL and NoSQL databases. 

The following is literature derived from slides at the Metadata Open Forum. 

NoSQL databases are known to be horizontally scalable and provide an alternative to Relational databases.  We explore the differences between both in terms of standards, characteristics and examples.  

In SQL databases, data is stored in columns and tables. Relationships are represented by primary and foreign keys There is language for data manipulation and data definition.  Transactions are facilitated and generally there is an abstraction from physical layer. This means that the physical layer can change without requiring changes in the application.  Applications specify what they want and not how they want it. Generally the following stages are involved: 

Parsing – The query string is converted into an Abstract Syntax tree 

Binding – This resolves the column or table names such that they have ids as represented in their metadata. 

Optimization - This applies a set of tree transformations to the tree so that the best possible execution plan is drawn. 

Code generation – The generated tree may be stored as is, or transformed partly into code (scalar expressions) or fully into byte code and saved in the plan cache. 

Query execution – While the code generated may be directly run, some of them may be converted into physical operators  

Cleanup – Transactions may be rolled back, locks may be released, logging as appropriate and memory may be reclaimed. 

 A NoSQL Database – on the other hand, is non-relational and distributed by nature. Its characteristics include schema-free representation of data, easy replication support, simple API and a different transaction model called BASE ( Basically available, soft state, eventually consistent ) They are able to handle data akin to warehouses because they scale based on nodes not merely horizontal or vertical partitions.  Inserts and updates are asynchronous. They are versioned and enable optimistic concurrency. Queries start returning answers quickly because because they approximate answers, return stale data, ensure availability first,  provide best effort, and are aggressive, simpler and faster. 

Although the consistency is weak, all node see the same data at the same time and the client perceives that a set of operations has occurred all at once. In other words, they are atomic but error tolerant and can be called multiple times. Node failures do not prevent survivors from continuing to operate. Every operation must terminate in an intended response. Operations will complete , even if individual components are unavailable.  NoSQL databases can appear in many forms but fundamentally the map-reduce operations are similar.  NoSQL databases can be Column Store,  Document Store, and Key-Value Store. A key-value store has a hash table of keys and values stored with keys. It provides fast access to small data values. The Map-Reduce operation consists of two phases :  

Map – extracts the sets of Key-Value pairs from underlying data and potentially in parallel from multiple nodes  

Reduce – merges and sorts sets of key-value pairs and the results may be useful for other searches. Map-Reduce operations are implemented by the application developers not the database. 

Monday, November 2, 2015

We continue discussing the problem on the 2006 sided regular polygon described earlier as
A diagonal of a regular 2006-gon is called odd if its endpoints divide the boundary into
two parts, each composed of an odd number of sides. Sides are also regarded as odd diagonals.
Suppose the 2006-gon has been dissected into triangles by 2003 non-intersecting diagonals.
Find the maximum possible number of isosceles triangles with two odd sides.

We used the lemma earlier that when a dissecting diagonal leaves a shorter path of the boundary of the 2006-gon that consists of n -segments, then the number of iso-odd triangles with vertices on this shorter path does not exceed n/2.
We said that if there is an iso-odd triangle PQS with all vertices on this shorter path and with PQ the diagonal being the longest, then S is the summit of the triangle PQS and the vertices divide the shorter path into four segments of AP, PS, SQ and QB.
Since PQ is part of an iso-odd triangle and is the longest diagonal, an iso-odd triangle cannot have vertices on both the first and the last segment. This means that every iso-odd triangle that fall on the shorter path has all its vertices on one of the four segments.
Given each of these four segments and applying the lemma we started with, we get the sum of the triangles in all these four regions to be less than n/2. Furthermore the two inner segments will have odd number of sides which makes the inequality of the number of triangles to be strictly less than n/2 and leaves one out of two in excess for both these inner edges. Because of this excess, PQS is also covered by this estimate n/2 and the lemma holds.
We can argue the same when the iso-odd triangle with the longest diagonal is drawn the other way with its third vertex not falling on the shorter path. In this case, the triangle is acute or right triangle not the obtuse we had seen earlier. And we have three regions XZ, YZ and XY where XZ and YZ are shorter than XY. For each of the three regions, we apply the lemma and infer that there are no more than n/2 iso-odd triangles in all unless XYZ is one of them. Thus the total number of triangles in the dissection is not greater than n/2.
We can also reason it for arbitrary triangles. Let us say ABC is an iso-odd triangle with AB and BC odd sides. This implies there are odd number of sides between A and B and B and C. We call these sides as belonging to iso odd triangle ABC.
At least one side in each of these triangles do not belong to any other iso-odd triangle because an iso-odd triangle with two sides of equal length and therefore there are even number of sides belonging to it. Eliminating all sides that belong to iso-odd triangles, we get one that isn't. Let us assign these two sides one in each group to the triangle ABC. To each iso-odd triangle, we have thus assigned a pair of sides with no two triangles sharing an assigned side. Therefore there 1003 such iso-odd triangles that can appear.

Friday, October 30, 2015

A diagonal of a regular 2006-gon is called odd if its endpoints divide the boundary into
two parts, each composed of an odd number of sides. Sides are also regarded as odd diagonals.
Suppose the 2006-gon has been dissected into triangles by 2003 non-intersecting diagonals.
Find the maximum possible number of isosceles triangles with two odd sides.

An isosceles triangle is called odd if it has two odd sides. A triangle in the dissection above which is odd and isosceles is called iso-odd.
We make an observation as follows:
Let AB be one of the dissecting diagonals and let Lambda be one of the shorter part of the boundary of the 2006-gon with endpoints A, B. Suppose that lambda consists of n segments. Then the number of iso-odd triangles with vertices on Lambda does not exceed n/2.
We can prove this by induction . For the initial case of n = 2, there can only be one such triangle. So this holds. Let us assume this observation holds for all lambda of length < n. Then we consider the case when Lambda == n. Let PQ be the longest diagonal which is a side of an iso-odd triangle PQS with all vertices on Lamda. If there is no such triangle, then we are not increasing the number of iso-odd triangles and hence the observation holds. If on the other hand, there is such a triangle, then every triangle whose vertices lie on lambda is right angled or obtuse with the maximum possible when it form the triangle PQS with the summit at S. Then we may assume that the five points APSQB lie on lambda and partition lambda into four pieces of AP, PS, SQ and QB each where possibly the outer two segments reduce to a point. Since PQ is the longest diagonal and PQS is an iso-odd triangles, it cannot have vertices on both AP and QB the outer segments. Therefore every iso -odd triangle lying within lambda has its vertices on just one of four pieces.Each of these pieces we can apply the same induction and we can add the four inequalities,we get the required number to not exceed n/2.

The intuition behind this approach is that there are only three vertices we need to pick to for such triangles and given that they can lie arbitrarily along the segments we categorize them into forms that the triangle can take and where those vertices may fall. Once we have a lemma for the number of such triangles, we can easily study it in each of the categories

Thursday, October 29, 2015

Today we review the book The 5 Choices by Kory Kogon, Adam Merrill, and Leena Rinne.
This book is about the path to extraordinary productivity when the workplace is getting a whole lot overwhelming and exhausting. Texts, Emails, interruptions meetings etc inundate us and distract us from thinking clearly to make good decisions and accomplish what matters. Therefore this book shows how to manage time and energy through five fundamental choices.
The book promises to increase our productivity at the individual team and organization levels and to feel engaged, accomplished and satisfied.
The book shares new ideas on how to be productive, how to get clear and focus on the things that matter to us, how to increase our capability to make decisions and how to recognize that you have the ability to do extraordinary work. The Five choices are enumerated as follows:
Choice 1 is to act on the important and not to react to the urgent. The authors say that we either react or we think. The reacting is the lower level where we process our feelings and emotions for a response but don't necessarily think it out. Thinking on the other hand is the upper level where we make conscious and intentional decisions. The longer we use the upper plane the better the outcome. We have some tools to do this. FranklinCovey's Time Matrix provides the framework and Pause-Clarify-Decide provides the process. The time matrix is a four quadrant visualizing tool that helps us categorize activities. The top left quadrant is the quadrant of necessity and contains things that are both urgent and important. The lower left quadrant is the quadrant of distraction. This is the one that gives a negative return on your attention and energy. The lower right quadrant is the quadrant of waste. We should try not to be here at all. The upper right quadrant is the one for extraordinary productivity. To spend more time in this quadrant, we should pause to think what item belongs in which quadrant and once its clear, we decide accordingly.
Choice 2 is to go for the extraordinary and not to settle for the ordinary. For each item that we place in the quadrant 2, we can attach a statement or story as we do on our scrum meetings. We anchor it to a purpose and passion. It helps us to tap into our motivation and higher performance.
Choice 3 is to schedule the big rocks and not to sort gravel. Quantifying our work items is akin to attaching story points to our stories. Consequently, we don't spend time on the little things that clutter up and prevent us from executing on the big items. A master task list is probably one of the best visualizing tools for this Since the list is the master and it contains items that we want to execute, we have all the items in our consideration and ranked appropriately.
Choice 4 is to rule the technology and not let it rule us. Very often we spend more time learning to use a tool or to work with a tool with the assumption that the tool will save us, but as a Japanese swordsman once said being independent of your tools and being present is the first step. You can manage the information you need to process by organizing them into appointments, tasks, contacts and notes/documents.
Choice 4 also includes a set of master moves that make our execution efficient. The first master move is to win without fighting Here we automate much of our decision making process. The second master move is to eliminate any additional quadrant 3 and 4 items from our inbox. The third master move is to enable more preparedness by including a link to locate so we spend hardly any time in Q1.
Choice 5 is to fuel our fire and not to burn it out. The five drivers of mental and physical energy are Moving,eating, sleeping, relaxing and connecting. Each of them is powerful in making us more productive.

Wednesday, October 28, 2015

Q: An equilateral triangle, partitioned into n2 congruent equilateral triangles by n-1 equidistant parallels to each of its three sides. Two chess-like bishops placed at any two vertices of the small triangles are said to menace one another if they lie on a same parallel. The problem is to determine the largest number of bishops that can be placed so that none menaces another.

A: A bishop may be assigned three coordinates a; b; c, namely the numbers of sides of small triangles they are off each of the sides of the big triangle.

In other words the condition above is that if i != j; then ai != aj ; bi != bj and ci != cj .
If any pair of these axes coordinates were same, then the bishops would be on the same triangle. If they are on the same triangle, they share the same parallel that made the triangle and hence would menace each other.

In fact the only solution that divides the number of congruent triangles is when the smaller triangle formed by the bishops is itself an equilateral triangle.
The first condition gives us the inequality that pairwise parallel-coordinates are distinct so we have their sum as N(N-1)/2
summing all the three co-ordiantes we have 3N(N-1)/2 <= sum of all( ai+bi+ci)
Since all the triangles are congruent,N = 4
therefore a + b + c = n

By solving this problem, we have the obtained the first condition as in the problem earlier. And again showed that the limiting case of equality is also included as a solution.

#codingexercise
Determine the type of a triangle given its edge lengths
Void printTriangle( int a, int b, int c)
{
If (a == b && b==c) printf ("equilateral");
If (a ==b || b == c || c == a) printf ("isosceles");
printf ("scalene");
}

Tuesday, October 27, 2015

#coding question.

Matrix has 1's followed by 0's. Find the row with the maximum 1

int GetRowWithMost1(int[,] matrix, int row, int column)
{
int maxCount = 0;
int ret = 0;
for (int I =0; I < row;i++)
{
int count = 0;
for (int j = 0; j < col; j++)
{
   if (matrix[I,j] == 1) count++;
}
if (count > maxCount){
    maxCount = count;
    ret = I;
}
}
return ret;
}

Q: For any integer n >= 2, let N(n) be the maximal number of triples (ai, bi, ci), i = 1, ...., N(n), consisting of nonnegative integers ai; bi and ci such that the following two conditions are satisfied:

(1) ai + bi + ci = n for all i = 1, ...., N(n);

(2) If i != j; then ai != aj ; bi != bj and ci != cj .

Determine N(n) for all n >= 2.

A: The second condition above suggests that a-coordinates in all the triples are pairwise distinct.
If we sum the a-coordinates we get a value that has to be greater than or equal to the sum of the suffixes i-1 from 1 to N. This is because ai has to be less than n and and i can only range from 1 to N(n)
Therefore the sum of a-coordinates has to be greater than or equal to the sum of consecutive numbers which is equal to N(N-1)/2
This same inequality holds for all the coordinates. Consequently, we have 3N(N-1)/2 <= Sum of all coordinates from 1 to N = nN
Therefore 3(N-1)/2 <= n
and N has to be less than or equal to the floor of (2n/3) + 1
If we take different examples we can see that N can be equal to the floor of (2n/3) + 1 and this is the value we are required to determine.

There are three different cases for n that we can take. These are
n=3k-1
n=3k
n=3k+1
and for each we can draw the examples up to floor(2n/3) + 1 and we will see that the limiting case of floor(2n/3) + 1 also holds the above two conditions as given. Hence the determination of N(n) is correct.