Cluster computing

Monday, June 23, 2014

In this post, we will cover an interesting topic : cliques and sub-groups.
In graphs and networks, one of the common interests is the presence of "sub-structures" that may be present in a network. Neighborhoods and groups of nodes fall into this category. When small compact sub-networks are joined to form large networks in a bottom up manner, we form extended network known as cliques. In cliques there's generally more interaction between the members than with others.
A clique can be considered a closed elite. We can also look for this substructure from the top down. The idea that some regions of graph may be less connected than the whole may lead to insights into cleavage and division. Weaker parts in the social fabric also create opportunities for brokerage and less constrained action.Most computer algorithms for locating sub-structures operate on binary symmetric data. We can use Knoke information exchange data to illustrate these sub-networks with strong ties. Knoke information exchange can be viewed as a binary connectedness values on the adjacency matrix of a directed graph.

I'm taking a short break today as I'm taking my time on another task from work today.

The use of matrices to describe social relations is as follows:
Transform a block operation allows us to select a matrix to be blocked, a row and/or column partition and a method for calculating the entries in a resulting block. We first split the row and column partition. These are just data sets which we then group to form partitioned data sets. This operation requires a method for summarizing the information within each block. The operation outputs two new matrices. The PreImage data set contains the original scores, but permuted. The reduced image data set contains a new block matrix containing the block densities.
The Transform collapse method allows us to combine rows and/or columns by specifying which elements are to be combined and how. Combinations can be maximum, minimum and sum. The result of the combinations is a new matrix with specified operation performed.
The Data -> Permute allows us to re-arrange the rows and/or columns and/or matrices. This operation requires us to list the new orders method needed.
The Data->Sort re-arranges the rows, columns or both of the matrix according to a criterion we select.
The Data-> Transpose re-arranges the data in a way that is very commonly used in matrix algebra and switches the columns with the rows.

Sunday, June 22, 2014

In tonight's blog post, we revert to the discussion on open graphs and matrix operations. We talked about matrix multiplication which is written in the form : (a1, a2, ... an in the same row) (x1, x2 ... xn in the same column) as resulting in a1x1 + a2x2 + ... + anxn. Note that this product of matrix can also be represented by A(b1, b2, .. , bn) = Ab1, Ab2, Ab3, .. Abn that is the matrix A acts separately on each column of B.
Row reduction is another operation.
This can be done in one of three different ways :
1) Multiplying a row by non-zero scalar.
2) adding a multiple of one row to another
3) swapping the position of two rows.
Each of these steps are also reversible so if you start from one state and go to the other, you can undo the change. This is called row-equivalent operations.
A matrix is said to be in a row-echelon form if any rows made completely of zeroes lies at the bottom of the matrix and the first non-zero entries of a staircase pattern. The first non-zero entry of the k+1 th row is to the right of the kth row.
If the matrix is in a row-echelon form, then the first non-zero entry of each row is called a pivot and the columns in which pivots appear is called pivot columns.It is always possible to convert a matrix to row-echelon form. The standard algorithm is called Gaussian elimination or row reduction.
The rank of a matrix is the number of pivots in its reduced row-echelon form.
The solution to Ax = 0 gives the pivot variables in terms of the free variables.

Saturday, June 21, 2014

In today's post we take a short break to discuss spatial and temporal control for a storage management. First we discuss spatial control. In terms of access to and from disk, sequential access is ten to hundred times faster than random access and more. Disk density has been doubling. Also bandwidth increases as the square root of density. As a result storage managers often organize large data in a way that it can be accessed sequentially. In fact database management systems exercised full control on how the database are partitioned on disk.
The best way for a storage manager such as a DBMS to do that is to lay it out on the disk itself and avoid the file system. This works especially well when raw disk device addresses correspond to physical proximity. But avoiding the file system has the following drawbacks. First it requires attention by a DBA and resource usage such as an entire disk partition. Second raw disk access is often OS specific and introduces portability concerns Third logical file managers such as RAID and SAN have become popular. Due to the presence of these interceptors to raw disk addresses that virtualize these, alternatives have to be considered. For example, we can create a very large file on disk and manage offsets. This is facilitated by the file system and also by virtualized storage managers.This improves performance to the point where the degradation in using a single file on a commercially large system was found to be about 6%
We now look at temporal buffering which is about when the data gets written and not about where the data gets written. If the DBMS wants to control when to write, it can get harder with OS implementing buffering mechanisms. Some of the challenges include the difficulty in guaranteeing ACID transaction promise because the transaction manager cannot guarantee atomic recovery on software and hardware failures without explicitly controlling the timing and ordering of disk writes. For example, the writes to the log device must precede corresponding write to the database device such as with the writeahead logging protocol. The second set of concerns with OS Buffering is about performance as opposed to correctness. The file system wants to read ahead speculatively and write behind with delayed batch writes which are poorly suited for a storage manager. A DBMS for instance can predict the IO decisions based on future read requests such as when reading ahead to scan the leaves of the B+ tree. Similarly when writing we could control say when we flush the log tail by making decisions about the locks or IO throughput. Finally, there may be double buffering and CPU overhead of memory copies. Copies degrade performance by contributing to latency, consuming CPU cycles and potentially flooding the CPU.
Courtesy: Paper by Hamilton, Krishnamoorthy et al.

Friday, June 20, 2014

I is another operation ill review matrix operation on graphs. When we represent graphs as matrices, we can now use mathematics and software tools to find patterns. This is easier than visually analyzing a complicated graph. The matrix representations are usually square but they can be three dimensional also consisting of rows columns and levels or slices. Matrices can also be symmetric or unsymmetric. In symmetric matrix the value in row I column t is the same as row t column I. As an example of an unsymmetric matrix, consider social relations where jack may have a liking for Jill but Jill may not have a liking for Jack.
Operations permitted on a matrix include the following
Permutation here the rows and columns are rearranged such as in a symmetric matrix to find patterns. With these reorderings, the pairwise relationships are not affected.
Partitioning is another operation where the matrix is split into blocks or images.
Embedding is another operation that helps finds groups of nodes that participate in different patterns

Thursday, June 19, 2014

In the previous post, we described a Chinese Whispers clustering algorithm. We now study the algorithm with two different approaches. The first approach is what we had discussed earlier as using small world edge weights.The outcome of this is similar to the min-cut algorithm where dense regions are considered clusters. But unlike min-cut there is no hierarchical clustering and instead gives a flat partition. It does not require any threshold as input parameter and is more efficient. The second approach is based on Matrix operations. This algorithm can be considered a special case of Markov Chain Clustering. MCL is the parallel simulation of all possible random walks up to a finite length on a graph G. Walks will terminate usually in the same cluster that they started from. This is an observation that comes from MCL. MCL simulates flow in the graph by repeatedly updating transition probabilities between all nodes eventually converging to a transition matrix after k steps. The updates to the transition probabilities between all nodes is done in two steps and then these two steps are iterated. The first step is the expansion step. This step involves the matrix multiplication of the graph matrix with the current transition matrix. The second step is the inflation step where the contrast between small and large transition probabilities is increased and normalized to unity thus providing a way to segment the matrix. The inflation step occurs in column wise operations.
The k-matrix multiplications of the expansion step has a significant cost.
In the early stages of the iterations, the matrices operated on are dense. But as the iterations proceed and with the help of a strong inflation operator, matrices in the later steps tend to be sparse. As an optimization the inflation step cost can be lowered by using only a few entries per column and can even be reduced to a single largest value.
The Chinese Whispers algorithm can now be described in matrix operations as follows:
Lets start with the identity matrix of dimension n
for each of the t iterations
we run a maxrow operator on all entries of a row to zero except the largest entry which is set to 1
then we multiply this resulting matrix with the adjacency matrix of the graph and use the result as the input to the next iteration.
Since the time complexity is dependent on the n non-zero entries in the result of the max-row operation, in the worst case this could equal the time complexity of MCL which is a fully connected graph.

We continue our discussion on Chinese Whispers graph clustering program. Its a bottom up clustering algorithm where the nodes start out with different class labels and coalesce to their strongest neighborhood class labels. We determine the strongest labels by the node with the maximal sum of edge weights. The clustering usually proceeds for a small number of iterations. The clustering here is different from clustering vectors because there is no distance metric and consequently no centroid or average. The similarity is measured in terms of edges. Chinese Whispers is one such graph clustering algorithm. It is often used with small world graphs. A small world graph is a kind of graph in which most nodes are not neighbors of each other and yet most nodes can be reached by every other by a short number of hops. As an example consider of a small world graph, consider a set of nodes on the circumference of a circle.
In the small world graphs clustering, sometimes a random mutation rate is introduced that assigns new classes and this is done with decreasing probability This greatly improves the convergence in early iterations. If we take a graph with eleven nodes where the nodes are arranged in two connected sub-graphs with some edges connecting to each other to form a small world graph, this technique converges the class labeling in just two iterations because the mutation acts as a test that then facilitates the proper labeling.
The algorithm does not cross component boundaries because there are no edges between nodes belonging to different components. This means that nodes that are not connected by any edges are discarded by this algorithm which leaves a portion of the nodes unclustered.
We see that the algorithm does not always converge as in the case of labeling a node with a tie i.e it could be assigned one or another class label with the same edge weights. This wouldn't happen if the edges were weighted in the graph. Also the classes generally don't update much after a few iterations.
Note that this simple technique does not take any parameters and the class labels are hard. If soft labels are required then we can assign a class distribution.This can be done as a final step by taking a weighted distribution of hard labels.