Cluster computing

Saturday, June 28, 2014

Good to great is a book written Jim Collins. The following is a summary review of the book. The title comes as a sequel to the author's earlier book called Built to Last which he attributed as greatness and hence the suggestion that many companies that are good don't get to the next stage of being great. Here he focuses on why more companies don't do what worked for others or do it more often. He picked eleven companies Abbott, Fannie Mae, Kimberly Clark, Nucor, Pitney Bowes, Wells Fargo, Circuit City, Gillette, Kroger, Phillip Morris and Walgreens. These companies had to meet the criteria that they had 15 year cumulative stock returns that were at or below the general stock market, punctuated by a transition point, then cumulative returns with three times the market over the next fifteen. These companies were selected over their competitors because they either do not show such a leap or did not sustain it. From these selections, the author describes patterns that emerged from this behavior.
The first topic was about leadership. Contrary to the belief that celebrities may have influenced the performance in the market with their charisma or influence, these companies had leaders who demonstrated Level 5 leadership. The term refers to those who unlike the celebrities had personal humility and intense professional will. Many such leaders often setup successors for success. Their humility came with a steely resolve and intolerance for mediocrity. Abbott Laboratories attacked nepotism in its company to make the leap. The levels from 1 to 5 are graded as follows: level 1 - comprises highly capable individuals who make productive contributions through talent, knowledge, skills and good work habit; level 2 comprises members who contribute at the team level and work with others; level 3 is a competent manager who organizes people and resources towards objectives; level 4 is a leader who draws commitment and pursuit to a clear vision; level 5 is those who establish the vision independent of themselves to make the company move to the next plane.
However, the author argues vision or strategy is not the primary prerequisite to make the leap, instead its the selection of people to get on the bus before deciding where to go. As an example, Wells Fargo continuously hired the best to prepare for changes in its industry. The book claims the following truths:
If we focus on who are with you rather than the mission, the changes is going to be easier.
If we have the right people, we spend less time on managing them.
If we have the wrong people, we cannot make the most of the right direction.
Essentially its about people who practice a culture where they don't worry about their positions.
It also implies not to hire too soon too fast and to put existing people not on the biggest problems but the biggest opportunities.
Another pattern that emerges from this company is the notion of disciplined thoughts. Self-assessment by companies is a pre-requisite to knowing what transition to make. Called brutal facts, these establish the weaknesses that are clear enough. The author highlights that some leaders were open enough to say "I don't know" and that is welcome. It simply argues for evaluations and positive feedback more often to keep refining it till it becomes clear. Clearer facts are easier to execute on, be measured and put in a cycle of improvements. Such facts could also be opened up to the public. To do this the book suggests the following:
- Lead with questions, not answers such that we often discover more and more than what we perceive.
- And engage in a dialogue not a debate, sermon, ridicule or worse coercion.
- Review past work but without blame and with a resolve to do better
- Keep metrics and thresholds so that they can translate to actionable alerts.
The book also introduces hedgehog concept which divides the world into two groups - the hedgehogs who translate the complex world into a single idea or a principle and the foxes who take the complex world for what it is and work on different levels often missing a unified direction. The hedgehogs win in a match with the foxes.
The use of the hedgehog concept is to illustrate that leaders with a simple mission can help drive the company from good to great. To find this mission, the concept talks about identifying the intersection of three dimensions : what we can be best at, what drives the economic engine, and what we are deeply passionate about.
Another pattern mentioned in this book is about disciplined action. These involve building a culture around the idea of freedom and responsibility, within a framework. It also involves getting passionate people who will take their responsibilities forward. The degree of adherence to the discipline should be reasonable and more towards the hedgehog concept than religion.
Lastly, the book mentions a pattern to use technology to accelerate and encourages the use of right technologies as opposed to new technologies. In fact, it cautions against the zeal to use new technologies.
With the patterns mentioned, the book suggests that there is a flywheel effect that can result in good to great transformations. The opposite of the flywheel is the doom loop which can be seen in the cases where the patterns are not followed.

Friday, June 27, 2014

Having looked at the bottom up approaches to finding sub-networks in a graph, I will now read the top down approaches from the Hanneman online text. The bottom up approaches help us understand the processes by which actors build networks. On the other hand, the top down approach helps us find holes, vulnerabilities, or weak spots that define lines of division in a larger group. These help to describe the levels of group selection and the constraints under which actors build networks.
We look at components first in this methodology. Components are subgraphs that have high cohesion and less adhesion.Components can have weak ties whether the direction of the ties don't matter and strong ties which are directed. If we have a metric that establishes connections between components as one where the actors participated in something common, then for a very high threshold, we may have few or no independent actors. and for a lower threshold, we may group them into a component. This therefore is too strong a definition to find weak points.
Blocks and Cutpoints are alternative approaches to finding key weak spots in graphs. If a node is removed and the structure is divided into disjoint parts then that node is called a cutpoint and forms a broker among otherwise disconnected groups. These groups are called blocks. We can find the blocks by the cut points While component analysis looks at missing links, this bi-component analysis looks at vulnerable links.
Lambda sets and bridges is another alternative approach for the same. This ranks each of the relationships in the network in terms of the importance by evaluating how much of the flow goes through each link and when disconnected would greatly disrupt the flow between nodes.
Factions are ideal groupings where the members are closely tied to one another but not to anybody else. This concept helps us assess the degree of factionalization in the population.
With the bottom up and top down approaches, we find sub-networks in a graph. The groupings or cliques - their number, size and connections can tell us about the behavior of the network as a whole.

Wednesday, June 25, 2014

A graph of knoke information shows strong symmetric ties. It also answers questions such as how separate are the sub-graph ? How large are the connected sub-graphs.Are there particular actors that appear to play network roles ?
We now look at ways to find sub-graphs. One approach is the bottom up manner. A clique extends the dyads by adding members that are tied to all the members in a group.The strict definition can be relaxes to include nodes that are not quite so tied as we will see shortly with n-cliques, n-clans and k-plexes. The notion however is to build it outwards to construct the network. The whole network can then be put together by joining cliques and clique like groupings.
Formally, a clique is the maximum number of actors who have all possible ties among themselves. It can be considered to be a maximal complete sub-graph. Cliques can be viewed in conjunction with the Knoke information matrix mentioned earlier. We might be interested in the extent to which these sub-structures overlap and which actors are more central or more isolated than cliques.
We can examine these by evaluating the actor by actor clique co-memberships.Then we can do hierarchical clustering of the overlap matrix which gives us an idea of the closeness of the cliques. Visually we can see the co-membership and the hierarchical clusterings as matrices formed from the actor and cliques and levels and cliques respectively.
Two major approaches to relaxing the definition for cliques are the N-cliques and N-clan approaches.
In N = 2, cliques we say that an actor is a member of a clique if it is connected to every other member of the clique at a distance greater than one, and in this case we choose two. The path distance of two corresponds to the actor being a friend of a friend.
The cliques that we saw before have been made more inclusive by this relaxed definition of group membership.
The N-clique approach tries to find long and string like groupings instead of the tight discrete ones by the original definitions. It is possible for actors to be connected through others who are themselves not part of the cliques.
To overcome this problem, some restriction is imposed additionally on the total span or path distance between any two members. This is the N-clan approach where all ties are forced to occur by means of others members of the n-clique.
If we are not comfortable with the idea of using a friend of the clique member as a member of the clique, we can use the K-plex approach. In this approach we say that a node is a member of a clique of size n if it has direct ties to n-k members of that clique. This tends to find a relatively large number of small groupings. This shifts focus to overlaps and centralization rather than solidarity and reach.
Courtesy: Hanneman notes

Monday, June 23, 2014

In this post, we will cover an interesting topic : cliques and sub-groups.
In graphs and networks, one of the common interests is the presence of "sub-structures" that may be present in a network. Neighborhoods and groups of nodes fall into this category. When small compact sub-networks are joined to form large networks in a bottom up manner, we form extended network known as cliques. In cliques there's generally more interaction between the members than with others.
A clique can be considered a closed elite. We can also look for this substructure from the top down. The idea that some regions of graph may be less connected than the whole may lead to insights into cleavage and division. Weaker parts in the social fabric also create opportunities for brokerage and less constrained action.Most computer algorithms for locating sub-structures operate on binary symmetric data. We can use Knoke information exchange data to illustrate these sub-networks with strong ties. Knoke information exchange can be viewed as a binary connectedness values on the adjacency matrix of a directed graph.

I'm taking a short break today as I'm taking my time on another task from work today.

The use of matrices to describe social relations is as follows:
Transform a block operation allows us to select a matrix to be blocked, a row and/or column partition and a method for calculating the entries in a resulting block. We first split the row and column partition. These are just data sets which we then group to form partitioned data sets. This operation requires a method for summarizing the information within each block. The operation outputs two new matrices. The PreImage data set contains the original scores, but permuted. The reduced image data set contains a new block matrix containing the block densities.
The Transform collapse method allows us to combine rows and/or columns by specifying which elements are to be combined and how. Combinations can be maximum, minimum and sum. The result of the combinations is a new matrix with specified operation performed.
The Data -> Permute allows us to re-arrange the rows and/or columns and/or matrices. This operation requires us to list the new orders method needed.
The Data->Sort re-arranges the rows, columns or both of the matrix according to a criterion we select.
The Data-> Transpose re-arranges the data in a way that is very commonly used in matrix algebra and switches the columns with the rows.

Sunday, June 22, 2014

In tonight's blog post, we revert to the discussion on open graphs and matrix operations. We talked about matrix multiplication which is written in the form : (a1, a2, ... an in the same row) (x1, x2 ... xn in the same column) as resulting in a1x1 + a2x2 + ... + anxn. Note that this product of matrix can also be represented by A(b1, b2, .. , bn) = Ab1, Ab2, Ab3, .. Abn that is the matrix A acts separately on each column of B.
Row reduction is another operation.
This can be done in one of three different ways :
1) Multiplying a row by non-zero scalar.
2) adding a multiple of one row to another
3) swapping the position of two rows.
Each of these steps are also reversible so if you start from one state and go to the other, you can undo the change. This is called row-equivalent operations.
A matrix is said to be in a row-echelon form if any rows made completely of zeroes lies at the bottom of the matrix and the first non-zero entries of a staircase pattern. The first non-zero entry of the k+1 th row is to the right of the kth row.
If the matrix is in a row-echelon form, then the first non-zero entry of each row is called a pivot and the columns in which pivots appear is called pivot columns.It is always possible to convert a matrix to row-echelon form. The standard algorithm is called Gaussian elimination or row reduction.
The rank of a matrix is the number of pivots in its reduced row-echelon form.
The solution to Ax = 0 gives the pivot variables in terms of the free variables.

Saturday, June 21, 2014

In today's post we take a short break to discuss spatial and temporal control for a storage management. First we discuss spatial control. In terms of access to and from disk, sequential access is ten to hundred times faster than random access and more. Disk density has been doubling. Also bandwidth increases as the square root of density. As a result storage managers often organize large data in a way that it can be accessed sequentially. In fact database management systems exercised full control on how the database are partitioned on disk.
The best way for a storage manager such as a DBMS to do that is to lay it out on the disk itself and avoid the file system. This works especially well when raw disk device addresses correspond to physical proximity. But avoiding the file system has the following drawbacks. First it requires attention by a DBA and resource usage such as an entire disk partition. Second raw disk access is often OS specific and introduces portability concerns Third logical file managers such as RAID and SAN have become popular. Due to the presence of these interceptors to raw disk addresses that virtualize these, alternatives have to be considered. For example, we can create a very large file on disk and manage offsets. This is facilitated by the file system and also by virtualized storage managers.This improves performance to the point where the degradation in using a single file on a commercially large system was found to be about 6%
We now look at temporal buffering which is about when the data gets written and not about where the data gets written. If the DBMS wants to control when to write, it can get harder with OS implementing buffering mechanisms. Some of the challenges include the difficulty in guaranteeing ACID transaction promise because the transaction manager cannot guarantee atomic recovery on software and hardware failures without explicitly controlling the timing and ordering of disk writes. For example, the writes to the log device must precede corresponding write to the database device such as with the writeahead logging protocol. The second set of concerns with OS Buffering is about performance as opposed to correctness. The file system wants to read ahead speculatively and write behind with delayed batch writes which are poorly suited for a storage manager. A DBMS for instance can predict the IO decisions based on future read requests such as when reading ahead to scan the leaves of the B+ tree. Similarly when writing we could control say when we flush the log tail by making decisions about the locks or IO throughput. Finally, there may be double buffering and CPU overhead of memory copies. Copies degrade performance by contributing to latency, consuming CPU cycles and potentially flooding the CPU.
Courtesy: Paper by Hamilton, Krishnamoorthy et al.