Cluster computing

Thursday, March 13, 2014

Today we look at the greedy method for job scheduling based on deadlines - with an optimization. The complexity is sort by price which is O(nlogn). We insert each job in current solution. This is the main concern. We maintain J in sorted order of deadline The worst case happens when each new job is feasible and has a smaller deadline than all of J.
To check the feasibility of job i, we begin by inserting in J of size i - 1
This takes i - 1 steps.
Therefore the complexity of insertion is 1 + 2 + ... + n - 1 = O(n ^ 2)
The final complexity is the complexity of sorting as well as insertion and this comes out to be O(n ^ The Bottleneck that we have is inserting each job in J.
This can probably be addressed with a once and for all slot for each new job. In this case, the addition of the job i with deadline di will be at lesser time since we find the rightmost free slot before di. We call this slot alpha-i. We would like to schedule i at slot alpha-i and since its the rightmost we don't need to move it afterwards. That is when a job is scheduled, it is scheduled as late as possible given the current constraints and we don't need to move it afterwards because no job is ever going to move to the right and we don't modify our constraints. The time slot i is actually the range [i - 1, i ]. Let Ni be the first free slot to the left of i. Then ni <= i. If i < j, ni = nj. For both of them the first free slot is on the left. i.e i and j are in the same set if ni = nj . So all { i, i + 1, i + 2, j-1, j } incoming jobs are all in one set. Given a set k in general, f(k) is the value ni. the largest interval free to the left of all of k. Initially no jobs need to be scheduled beyond the largest deadline, Time slots are 1, 2, .. D . D is the maximum deadline of all jobs. All slots are free, so ni = 1 for all i. Sets {1}, {2}, .. , {D} F({j}) = j. Add a job with deadline d, we compute nd. To compute nd, check all k such that d belongs to k,
n(d) = F(k) and this tends to zero when the job is not feasible. Otherwise we can schedule new job at time F(k). Now we have to update F(k). The next free slot must be further to the left. So the new value must be less than F(k) - 1 . We know there is a set contains this value. So we just have to merge the sets.Merge set K with set containing F(k) - 1. So I will have n jobs and n disjoint operations on this data set.
We can explain this with an example. We have six jobs and their deadlines are as shown 2,1,2,6,1,6 K = {0},{1},{2},{3},{4},{5},{6} F(k) for these are correspondingly 0,1,2,3,4,5,6
We create a line of time slots and we add zero to it. So we have 0,1,2,3,4,5, and 6
when you look at the first job, the deadline for this job is 2 so we have it in its own set {2} and F({2}) = 2. Implicitly this means that we are going to assign job 1 to time slot 2. and we have to update the fact that the time slot 2 is no longer available and this we do by merging {2} with the set containing F({2}) - 1 which is equal to 1. and the set corresponding is {1} and we merge {2} with {1} and this will gives us a new picture. Merge {2} with {1} and the number of sets have collapsed by 1 resulting in {0} {1,2}, {3}, {4}, {5}, {6} with F values 0 ,1, 3, 4, 5, 6 . We put 2 at 1 and merged {1,2} so we have to continue that with the set F({1, 2}) - 1 = 0 resulting in a new set {0, 1, 2 } so continuing d[3] = 2 and 2 belongs to {0,1,2} but F of this set is 0 so the job 3 is not feasible. We discard it. d[4] = 6 so we assign it to timeslot 6. This belongs to set {6} and we merge it with the F({6}) - 1 = 5 which is the set {5} resulting in {5,6}
We now have sets {0,1,2}, {3}, {4}, {5,6} with F values 0,3,4,5 respectively. d[5] = 1 and 1 belongs to set {0,1,2}.This is not feasible. So finally we come to the last job d[6] and d[6] = 6 and 6 belongs to {5,6} and F is 5 so we implicitly schedule job 6 at 5 and we merge {5,6} with 4 and there are no more jobs. We have so far picked out the jobs 1,2, 4 and 6 and we have a schedule.

Wednesday, March 12, 2014

We will cover some more discussion the traveling salesman problem. We have discussed memorization and branching and bounding so far. We will consider NP completeness today. The class P kind of problems are based on Yes/No answers. They have a deterministic polynomial time algorithm. The class NP kind of problems are based on Yes/No answers for which a non-deterministic polynomial time algorithm is known. The advantage with the NP kind of problems is that it can guess many different choices for the solution and if one of the choices is correct, it always has the magical ability to guess that particular choice.
Note that the traveling salesman problem is not phrased as a yes no problem. Instead it is stated as given a weighted undirected graph find a TSP tour visiting each vertex only once coming back to the start and with the minimum weight w <=k But we can rephrase this as a yes no problem by asking if the cost is less than k.
Moreover, we can
guess the order in which to tour the vertices and
check to see that the total weights is <= k
Note that the answers are dependent on the value of k. If none of the total weights of different choices are < k then no matter how we guess the total weight will be greater than k and we will answer no.
Now recalling that P is a subset of NP. we will say that P space problems are also NP. We haven't been able to find an algorithm in NP yet and that is why it begs the question if P = NP. What we do know in addition to this is that there are a class of problems that are really really difficult to solve and these are called NP-complete problems and they lie on the perimeter of the NP space problems.
We will now look to see if TSP problem is NP-complete.
We answer the question Is there a TSP tour of weight <=k in G
We ask a different question for NP-complete:
Given a graph G with n vertices, is there a path in G of length n - 1 ? and this is a well known way of asking this problem as NP-complete and is referred to as the Hamiltonian path.
Suppose there is a deterministic polynomial time algorithm TSP(G,k)
Then we show that there is a deterministic polynomial time algorithm HAMPATH(G). If we can accomplish that then this will mean that TSP problem is NP-complete because we stated the Hamiltonian path as an NP-complete problem.
How we can use the TSP(G,k) to solve the Hamiltonian path problem, we will now see. We have to answer the question : - is there a path of length n-1 in a given n-vertex graph G ?
We add an extra vertex and connect all the other vertices with a new edge each. This does not take too long. This we call the graph G' where every edge has weight 1 and missing edges have weight infinity.
We now write the Hampath(G) steps as
construct G' as above
return T(G', n + 1)
Now G has a path of length n - 1if and only if G' has a tour of weight <= n + 1
In other words, if we take two edges to connect the new vertex on the way out and on the way in, then the remaining n-1 edges will be the path for the graph G.

We continue to take a look at the traveling salesman problem today. We can improve the memoziation with branching and bounding. The idea here is that the recursive function call in memoziation step earlier could be very expensive so we make it faster with branching by first looking up a table to see if the value was previously computed. We branch if it was previously computed and this makes it faster. If not, we replace the expensive recursive call with a cheaper call for the lower bound of the subspace. If this lower bound and the wmeight of the edge connecting s collected vertices with the new vertex is less than the value of the variable Best we have computed so far, then we perform the regular recursive call
for all m not = 1 on S - k
If M [S - k, m] is defined
Best = min ( Best, M[S-k, m] + w(m, k)
else
temp = LB (S - k, m )
If temp + w (m, k) < best
Best = min ( Best, Memo(S-k, m) + w (m, k))

Tuesday, March 11, 2014

we discuss the traveling salesman problem. In this problem, we have an undirected weighted graph where the vertices represent cities and the weights represent the cost to travel between cities since the graph is undirected, the cost to travel from one city to another is the same in both directions. The goal 8 the problem is to reduce the cost of visiting each city in a tour. We start from a city and return back to the city.
We solve this problem by dynamic programming.

For any set s belonging to V such that t belongs to S, we compute a quantity M[s, k] which represents the minimum cost of travel from a vertex t, visiting each vertex in S and then visiting k
M [s, k] is the minimum cost = w (1, k) is s = 1, k or it is M [s - k, m ] +w (m, k)

We could also write an equivalent using memoziation

minTSP (v):

Best = infinity

for k = 2 to N

Best = min ( Best, memo (s, V), w (1, k) )

return next

memo(s, k):

If M (s, k) is defined

Return M (s, k)

If s == 2

Return M (s, k) = w (1, k)

Best = infinity

For all m not = 1 in S - k

Best = min ( Best, Memo({s}-k, m), w (m, k) )

Return M (s, k) = Best

Monday, March 10, 2014

Today we take a break to look at No queens problem. This problem talks about placing N queens in a NxN board so that no queen can attack each other. Before we look at the solution, let's first see that when N = 2 and when N = 3 there are no solutions. And now let us take a NxN chessboard here we find there us a solution possible. So how do we arrive at that solution ? Let's say we placed one queen in one row. That queen can attack any other horizontally, vertically and diagonally. So one queen can occupy one row and there are N positions to choose from so we have nxn choose N positions that is quite large for large N. We could improve that by putting one queen on one row only and then we have N power N way to choose from. This is much smaller than the previous way
Next we look at a recursive method where we place a queen on the board and eliminate the cells where other queen can be placed Then we use this to place the other queens in a depth first manner . This strategy is called branch and bound. We set up a while loop where we eliminate the previous cells and place the next queen. In each recursion if we could pick the sub problem that is most promising and also eliminate the sub problem that doesn't have a solution, we will have a faster path to the solution.

Sunday, March 9, 2014

Splunk modular input for ELMAH.

For many web developers particularly those who work with the Microsoft.Net stack, ELMAH is a well known logging module and handler for ASP.Net. What distinguishes it from the Apache log4net module is that it is "completely pluggable and can be dynamically added to a running ASP.net application without any change to its code" i.e there are no log write instructions added or removed to the existing binary of a running application.When ELMAH is dropped into a running application, it can log nearly all unhandled exceptions, provide a web page to remotely view the entire log of recorded exceptions or the full details of a single exception and to write to different back-end providers or to send out e-mail notifications.

While we discuss ways to send the output of ELMAH to splunk via a connector, we discuss multiple ideas such as
1) enable Splunk to be a first class back-end provider for loggers such as ELMAH, enterprise library and log4net.
2) write a http handler for Splunk on Windows that can transparently send web messages to Splunk
3) write a Splunk NuGet package that can provide the Splunk SDK to .Net developers.
and
4) write the Splunk connector to directly read the static and dynamic content of different web pages that a web application registers.

We will explore these options in upcoming posts and will discuss ELMAH for now. When a request arrives at an IIS web server, IIS examines the extension of the request to see how to proceed. In addition to processing the extension, IIS provides an ability to write filters that can handle various events raised by IIS such as the event when a request comes in, when a request is authenticated and when a content is rendered back. When the resource requested in an ASP.Net resource, it is forwarded to the ASP.Net engine. The ASP.Net engine behaves similar to IIS in that it knows how to respond to the request and it raises events. While the IIS uses unmanaged code, the ASP.Net engine uses managed classes called handlers or modules.
An HTTP module is one that can tap into various events raised as the request passes through the http pipeline. One such event is the Error event and this is what ELMAH is registered to listen to. Many modules can subscribe to many different events. An http handler returns the markup for the requested resource. The ASP.Net engine passes the rendered content to the IIS which forwards it to the client.

Since application servers and database servers are two common production deployments, writing a handler for Splunk like ELMAH will make it easier to collect data instead of configuring each source for input.

We look at establishing lower bounds via decision trees. In particular we look at comparison algorithms. These essentially restrict algorithms to what can be compared based on input. For example, given an array of integers, if we change the input how does the algorithm behave.
In the case of sorting we can show that algorithms will asymptotically reach O(nlogn) and in the case of searching where we look for an element in a sorted array we can show that algorithms will asymptotically reach O(log n).
Now we look at decision trees. In decision trees we ask questions that have a yes or no value. In comparison algorithms we don't modify the input. we compare the data and the algorithms output a result based on comparisons and the input doesn't matter. That is the relative ordering matters but not the values. For example, [5,7,2] will be treated the same as [6,9,1]. Outputs can be seen as a permutation of [1..n]
For example, if we look at Selection sort, we compare based on the values of the elements of the array and when we swap we swap only in a local copy of the array and not the original input.
For an input a1, a2, and a3 with n=3, a selection sort can be seen as a decision tree with the following comparisons
Level 0: a1 < a2
Level 1: a1 < a3 and a2 < a3
Level 2: a2 < a3 and a2 < a1 and a1 < a3 and a2 < a1
yielding possible outcomes as
a1a2a3, a1a3a2 a3a1a2 a2a1a3 a2a3a1 a3a2a1
In general we can say that for a fixed input size n elements of an array, there are f(n) possible outputs
where we the height of the decision tree for comparisons is > = log(f(n))
The lower bound for sorting in the comparison model is based on the number of outputs. For n elements, the number of outputs is n! which are the number of leaves. # leaves >= n! and this is f(n). For any algorithm A for sorting int the comparison model, it must make log(n!) comparison. We know that n! > (n /e ) ^ m
log n! = Omega(nlogn)
The worst case for comparisons is lower bounded by Omega(nlogn)
The theorem is that any comparison based sorting algorithm requires omega(nlogn) comparisons in the worst case. Additionally, we find that the average case for complexity of such sorting algorithm is Omega(nlogn)