Cluster computing

Sunday, August 28, 2016

We talked about graph extensions for relational data here. We now explore whether such an extension is possible when the entire relational database is in memory.

The challenge with graph abstraction of relational data is that it increases the size of the data with persisted predicates in the RDF description of the framework. Unless the entire RDF model is rendered, the graph operations are not executed.

There are two challenges with this:

1) The entire RDF model is expensive to create both in terms of time and space

2) The graphs have to be warm and available in memory for the duration the graph extension is available.

Graph databases that worked with NoSQL option could make use of the following two attributes:

1) Batch processing of the graph with parallel and offline analytics

2) Partitioning of graphs

Instead if we use a streaming model of the graph and provide graph queries that can operate on a stream of graph, then we can provide approximate answers even when operating entirely in memory.

Some Graph databases do not have a query language that supports streaming operations. They merely stream results.For example, cypher query language does not seem to support streams.

Microsoft Stream Insight supports stream processing

#algorithm discussion continued
Today we also continue reading the flow networks. We were looking at max flow min cut theorem and the properties associated with it. We saw how it applied to Ford Fulkerson method. Each iteration of the method augmented the flow by considering a path in the residual network until there is no more. For each path considered, the residual capacity of that path is found and augmented to the flow network.
How the path is chosen is not mentioned by the method. This is usually done with Breadth First Search
When the BFS is used to choose the augmenting path as a shortest path from s to t in the residual network, where each edge has unit distance, the algorithm is then called Edmonds-Karp algorithm
It can be shown that with this algorithm for all vertices, the shortest path distance in the residual network increases monotonically with each flow augmentation.
The use of BFS also bounds the number of iterations of the Edmond-Karp algorithm. This bound is O(VE) for a graph with V vertices and E edges. This is explained in terms of critical edges. An edge (u,v) is considered critical on a path p if the residual capacity of (u,v) is that of the path. We already know that at least one edge on any path must be critical. When a flow is augmented along the critical edge, that path disappears from the residual network because the flow is already considered. It can only appear again if we consider an augmenting path in the reverse direction. We know that the path earlier was one hop away because we used BFS. If the vertex u becomes unreachable from the source again if ever, by BFS this must be yet another hop away from start giving a total of two hops. Therefore that edge becomes critical at most (v-2)/2 times again because there are at most v/2 times that the edge become critical again.Since there are at most E edges, each can become critical at most V/2 times.

#codingexercise
Find minimum difference between any two elements of an array

int GetMinDiff(List<int> nums)
{
int min = INT_MAX;
for (int i = 0; i < nums.count; i++)
for (int j = i+1; j < nums.count; j++)
if (Math.abs(nums[i]-nums[j]) < min)
min =Math.abs(nums[i]-nums[j]);
return min;
}

Find the next greater element to the right of all elements in an array
void prntNextGreater(List<int> nums)
{
for (int i = 0; i < nums.count; i++)
{
int next = INT_MAX;
for (int j = i+1; j < nums.count; j++)
{
if (nums[i] < nums[j])
{
next = nums[j];
break;
}
}
Console.write("{0}-{1}",nums[i], next);
}
}
#puzzle
There are 25 horses among which you need to find out the fastest 3 horses. You can conduct race among at most 5 to find out their relative speed. At no point you can find out the actual speed of the horse in a race. Find out how many races are required to get the top 3 horses.
The answer is 7. The solver uses a process of exclusion. The first five races gives the best in each group and the overall number 1. The next two races give the next fastest horse each Note the selection of horses spans two groups after the first round by excluding the weak horses.

Cluster computing

Sunday, August 28, 2016

No comments:

Post a Comment