Cluster computing

Thursday, August 13, 2015

#codingexercise
Write a function condense_meeting_times() that takes an array of meeting time ranges and returns an array of condensed ranges.
For example, given [(0, 1), (3, 5), (4, 8), (10, 12), (9, 10)]
return [(0, 1), (3, 8), (9, 12)]
void condense_meeting_time(List<Tuple<int, int>> meetings)
{

Sort(ref meetings, compararer); //comparer sorts meetings by start time, end time

var merged = new Stack<Tuple<int. int>>();

merged.add(meetings[0]);

for (int i = 1; i < meetings.length; i++)
{
var top = merged.Peek();
if (top.Item2 == meetings[i].Item1)
{
top.Item2 = meetings[i].Item2;
merged.Pop();
merged.push(top);
}
else if (top.Item2 < meetings[i].Item1)
{
merged.Push(meetings[i]);
}
else if (top.Item2 < meetings[i].Item2)
{
top.Item2 = meetings[i].Item2;
merged.Pop();
merged.push(top);
}

}

while (merged.empty() == False)
{
var top = merged.Peek();
Console.WriteLine("[{0},{1}]", top.Item1, top.Item2);
merged.pop();
}

}
Today we start the review of a book titled the four disciplines of execution 4DX for short. This is not a strategy or theory book claims the authors - McChesney, Covey and Huling who are themselves Leaders in FranklinCovey - a company that consults on this practice in over 130 countries.
To begin with 4DX talks about the challenges in execution - major initiatives crash with a thud, important initiatives are quietly scuttled, and the overall distraction of the whirlwind - a term used to describe the chores that take a disproportionate amount of time from your goal or cause a randomization of your tasks. This book says the author is not about managing the whirlwind but to effectively meet your wildly important goals (WIG) even in the face of a whirlwind and can help you reclaim as much as 20 percent of your time for your WIG. A WIG is one that can make all the difference. By such a definition, a WIG has to be narrow and focused. A WIG not only gives you direct benefits but also gives you new levels of performance. You can choose it from within the whirlwind or outside. But the following practice will help:
1) no team focuses on more than one or two WIGs
2) the battles you choose must win the war
3) senior leaders can veto but not dictate the goal.
4) All WIGs must have a finish line and a duration.

The above rules help you narrow and find your WIG.

The next set of rules is to act on the lead measures.

A lag measure tells you if you have achieved the goals, a lead measure tells you how likely you are to achieve it.

To define and track a lead measure, the authors cited an example of how the managers of a baseball team determined that the number of runs was the criteria to find the most productive player. Defining such a lead measure makes everyone feel empowered to achieve it.

Then the third discipline is to keep a compelling scorecard. Everyone should know the scorecard at all times. This will let them know whether they are winning or losing. When the scorecard is easy to visualize and updated regularly, the whirlwind will not take it away from you. Great teams must know whether they are winning or losing otherwise they will not know what works and doesn't to win. Everyone gets to see the scorecard and everyone knows what to do to win.
A scorecard should be
1) simple just like a scoreboard in a game
2) it should be prominent just like the scoreboard
3) It should show lead and lag measures
4) and it should tell at a glance whether you are winning or losing

Finally the fourth discipline is creating a cadence of accountability - a frequently recurring cycle of accounting for past performance and planning to move the score forward. Discipline 4 is where execution happens because the team meets at least weekly in a WIG session. The meeting lasts no longer than 20 to 30 minutes, has a set agenda and goes quickly In this meeting, you should hear account reports from everyone, you should review the scorecard to learn from successes and failures, and to clear the path and make new commitments. Blockers for any person must be discovered and removed. Each fresh commitment must meet two standards - it must represent a specific deliverable and it must influence the lead measure.

The authors say that the accountability here is different from the usual organizational one. It's more personal. Instead of saying it is to an outcome, it is something to yourself When this resonates with the peers, it improves the performance of the team dramatically. The WIG session encourages experimentation with fresh ideas. It engages everyone in problem solving and promotes shared learning.

Having discussed what the 4DX is, the authors outline the different stages of behavior change.
Stage 1) Getting clear - this is when the team knows what WIG and 4DX are , 2) Launch stage - this is when the leader and the team get together to focus, 3) adoption This is probably the stage that takes most time When the WIG sessions begin to go smoothly, you know that this stage is being met. 4) Stage 4 - optimization - this is to brainstorm and follow through on efficiencies and Stage 5) Habit stage. This is when you have to make the habits of what has been working for you

Now each of the items in a bit more actionable mention so that you can install them:
Step 1 : Consider the possibilities:
Brainstorm with peer leaders
Brainstorm with your team members
Brainstorm alone
Step 2: Rank by impact
When you are satisfied with your list of candidate Team WIGs, start prioritizing
Step 3: test top ideas -
Is the team aligned to the overall WIG
Is it measurable
who owns the results
who owns the game
Step 4 : Define the WIG
Once you have selected and tested the ideas for high impact team WIGs, define it as follows:
1) begin with a verb
2) define the lag measure
3) keep it simple
4) focus on what, not how

Installing discipline 2) act on the lead measures and this is usually the most difficult
There are three reasons :
Lead measures can be counterintuitive
Lead measures are hard to keep track of
Lead measures often look too simple

To follow through on this discipline, here are the steps again :
1) Consider the possibilities
2) Rank by impact
3) Test top ideas
- Is it predictive
- Is it influencable
- Is it ongoing process
- Is it a leaders game or a team game
- Can it be measured
- Is it worth measuring
4) Define the lead measures
- Are we tracking them or individual performance
- Are we tracking the lead measure daily or weekly
- what is the quantitative standard
- what is the qualitative standard
- does it start with a verb
- is it simple ?

Installing discipline 3
1) Choose a theme
2) design the scoreboard
3) Build the scoreboard
4) Keep it updated

Installing discipline 4
1) demonstrate respect in your language and in your actions
2) reinforce accountability by making it transparent how important the task may be
3) encourage performance by empowering and reinforcement

The final section of the book is about installing 4DX across the organization and not just your team.
1) clarify the overall WIG
2) design the team WIGs and lead measures
3) leader certification by training the leaders separately
4) Team launch - when leaders conduct a launch
5) execution with coaching - leaders typically need experienced coaching
6) Quarterly summits

As we practice this more and more, we will realize that execution is ubiquitous and therefore so is 4DX and you can draw to it to your closer circles and personal life as well.

#codingexercise
Int height ( node root )
{
If (root == null) return 0;
Return max ( height(root.left) , height(root.right)) +1;
}

While Queen Elizabeth has a limited number of types of cake, she has an unlimited supply of each type.
Each type of cake has a weight and a value, stored in tuples with two positions:
1 An integer representing the weight of the cake in kilograms
2 An integer representing the monetary value of the cake in British pounds For example: # weighs 7 kilograms and has a value of 160 pounds (7, 160) # weighs 3 kilograms and has a value of 90 pounds (3, 90) You brought a duffel bag that can hold limited weight, and you want to make off with the most valuable haul possible. Write a function max_duffel_bag_value() that takes an array of cake tuples and a weight capacity, and returns the maximum monetary value the duffel bag can hold.
int fill_knapsack(int Capacity, int [] weights, int[] value, int number)
{
if (number == 0 || Capacity == 0) return 0;
if (weight[n-1] > Capacity) return fill_knapsack(Capacity, weight, value, n-1);
return max(val[n-1] + fill_knapsack(Capacity-weights[n-1], weights, value, n-1), fill_knapsack(Capacity, weights, value, n-1);
}

Wednesday, August 12, 2015

#codingexercise

A turning number is the maximum number in a unimodal array that increases and then decreases. Please write a function (or a method) that finds the index of the turning number in a unimodal array.

For example, the turning number in the array {1, 2, 3, 4, 5, 10, 9, 8, 7, 6} is 10, so its index 5 is the expected output.

Int getTurningIndex(int[] numbers)

{

If (numbers.length < 2)

Return -1;

Int left = 0;

Int right = numbers.length -1;

While (right >= left+1) {

Int mid = (left+right)/2;

If (mid == left ) {

If (numbers[left] > numbers[right])

Return left;

Return -1;

}

If (numbers[mid] > numbers[mid -1] && numbers[mid] > numbers[mid+1])

Return mid;

Else if (numbers[mid] > numbers[mid-1] &&

Numbers[mid] < numbers[mid+1])

Left = mid;

Else

Right = mid;

}

Return -1;

}
The above function assumes a decreasing order means the turning number is the first number and a strictly increasing order has no turning number.
This could be modified to detect the presence of an increasing sequence or return -1.
Also we use a binary search because the numbers are mostly ordered.
#codingexercise

here is a stair with n levels. A frog can jump up one level or two levels at one time on the stair. How many ways are there for the frog to jump from the bottom of the stairs to the top?

For example, there are three choices for the frog to jump up a stair with three levels: (1) it jumps in three steps, one level for each jump; (2) it jumps in two steps, one level for the first jump and two levels for the second jump or (3) it jumps with two steps, two levels for the first step and one level for the last jump.
Can we do tail recursion here ?
If the last set was a single step, then we have a subproblem of n-1
If the last set was a two step, then we have a subproblem of n-2
we define the recursion as :
f(n) = f(n-1)+ f(n-2);
f(0) = 1
f(1) = 1
f(2) = 2
f(3) = 3
and this is a Fibonacci series

Another one:

Rectangles with size 2 x 1 are utilized to cover other rectangles, horizontally or vertically. How many approaches are available to cover a 2 x 8 rectangle with eight 2 x1 rectangles ?

Again using a similar argument as above, if we keep the 2 x 1 vertically to cover the leftmost vertical then we have sub-problem of 2 x 7
If we keep two horizontal 2 x 1 one on top of the other over the leftmost, we have a subproblem of 2 x 6 rectangle.
Let f(n) denote the number of choices to lay the smaller rectangle over a 2 x n rectangle.
Then the recursion is f(n) = f(n-1) + f(n-2) which is again a Fibonacci series.

Tuesday, August 11, 2015

We briefly discuss SUMMONS which is the first example of a multi document summarization system and illustrates one of the stages of text summarization - topic fusion. It tackles news events about a single topic and produces a briefing that merges relevant information about each event and how reports by different newsagencies have evolved over time. Rather than working with raw text SUMMONS reads a database previously built by a template-based message understanding system. Therefore there are two stages: one that takes fulltext as input and fills template slots and then synthesizes a summary and the other that takes summary opertors and a set of rules and with connective phrases smoothens out a summary. While SUMMONS works great with narrow domain texts, it does not scale to internet documents like the ones from a web search query. This was later improved with systems by McKeown 1999 and Barzilay 1999 that identifies the topics first by clustering and then applying decision rules. These systems start by identifying themes in the form of text units or paragraphs which is then clustered based on similarity measures. To compute thesimilarity measures between text units, these are mapped to vectors of features, that include single words weighted by their TF-IDF scores, noun phrases, proper nouns, synsets from the WordNet database and a database of semantic classes of verbs. For each pair of paragraphs, a vector is computed that represents matches ondifferent features. Decision rules that were learned from data are then used to classify each pair of text units either as similar ordissimilar; this in turn feeds a subsequent algorithm that places the most related paragraphs in the same theme. Once themes are identified SUMMONS enters into an information fusion stage where sentences about a theme that should be included in the summary are decided. But instead of picking these sentences, analgorithm is proposed which compares and intersects predicate argument structures of the phrases within each theme to determine which are repeated often to include in a summary. The comparisonalgorithm then traverses these dependency trees recursively adding identical nodes to output tree. Once the summary content is decided, a grammatical text is generated by translating those structures into arguments expected by the FUF/SURGE languagegeneration system.

Courtesy: Survey on Automatic Text Summarization by Das and Martins

#codingexercise

Question: Define a queue in which we can get its maximum number with a function max. In this stack, the time complexity of max, push_back, and pop_front are all O(1).

Answer:

Public class QueueWithMax<T>{

Public class InternalData {

T number {get;set;}

Int index {get;set;}

}

Deque<InternalData> data;

Deque<InternalData> maximums;

Int currentIndex;

Public QueueWithMax(){

currentIndex = 0;

}

Public void push_back(T number) {

While(maximums.empty == false && number >= maximum.back().number)

maximums.pop_back();

InternalData idata = {number, currentIndex};

data.push_back(internalData);

maximums.push_back(internalData);

++currentIndex;

}

Public T pop_front(){

If (maximums.empty())

Throw new exception(“queue is empty”);

If (maximums.front().index == data.front().index)

Maximums.pop_front();

Return Data.pop_front().number;

}

Public T max() const {

If (maximums.empty())

Throw new exception(“queue is empty”);

Return maximums.front().number;

}

Monday, August 10, 2015

#codingexercise

Question: If a number only has factors 2, 3, and 5, it is an ugly number . For example, 6 and 10 are two ugly numbers, but 14 is not because it has a factor of 7. Usually 1 is considered to be the first ugly number. What is the arbitrary kth ugly number? Can we do this efficiently?

Solution:
The straightforward solution repeatedly divides by 2, 3 and 5 each until the remainder is 1 in which case the number is ugly or it is more than one. However, this solution is not efficient since it does the same check for non ugly numbers. Instead we could store previously encountered ugly numbers and use their factors.
int GetUglyNumber(int index)
{
if (index <= 0)
return 0;
var uglyNums = new int[index];
uglyNums[0] = 1;
int nextUglyIndex = 1;
int index2 = 0;
int index3 = 0;
int index5 = 0;
while (nextUglyIndex < index)
{
int min = Math.Min(uglyNums[index2]*2, uglyNums[index3]*3);
min = Math.Min(min, uglyNums[index5]*5);
uglyNums[nextUglyIndex] = min;
while (uglyNums[index2] * 2 <= uglyNums[nextUglyIndex])
++index2;
while (uglyNums[index3] * 3 <= uglyNums[nextUglyIndex])
++index3;
while (uglyNums[index5] * 5 <= uglyNums[nextUglyIndex])
++index5;
++nextUglyIndex;

}
return uglyNums[nextUglyIndex-1];
}

We briefly discuss SUMMONS which is the first example of a multi document summarization system and illustrates one of the stages of text summarization - topic identification and fusion. It tackles news events about a single topic and produces a briefing that merges relevant information about each event and how reports by different news agencies have evolved over time. Rather than working with raw text SUMMONS reads a database previously built by a template based message understanding system. Therefore there are two stages : one that takes fulltext as input and fills template slots and then synthesizes a summary and the other that takes summary opertors and a set of rules and with connective phrases smoothens out a summary.
#codingexercise
Public class compare:IComparer
{
Int compare (int i, int j){
Return i%10 <j%10;
}
}

Saturday, August 8, 2015

Today we conclude reading the Text Summarization with multi-document summarization and evaluating summaries:

When summarizing multiple documents, one has to identify and locate thematic overlaps. One also has to decide what to include of the remainder to overcome potential inconsistencies between documents, and when necessary to arrange events from various sources along a single timeline.

Assuming that all input documents are parsed into templates (whose standardization makes comparison easier), systems like SUMMONs clusters the templates according to their contents and then applies rules to extract items of major import.

To determine what additional material should be included, first identify the units most relevant to the user’s query and then estimate the marginal relevance of all remaining units.

Lastly, some domains lend themselves to simple rules that can even get near perfect results. For examples, news articles covering the same event can be summarized by picking the most recent one in the timeline and taking the first two or three paragraphs.

How can you evaluate the quality of a summary ? This can be both intrinsic and extrinsic – the former determines output quality and the latter measures user assistance in task performance. The intrinsic ones use sentence or phrase recall and precision. A second intrinsic method is to require evaluators who judge based on readability, informative-ness, fluency and coverage. Extrinsic evaluation is easy to motivate.IR methods produce the best query based extracts. Much of the scoring process becomes simplified to IR like recall and precision measures against human extracts. That said the biggest challenge here is that summaries receive different scores with different measures.

There are two basic measures to capture the extent to which a summary meet s the requirements to be smaller than the input text and contain the most important information of the original. These are:

1) Compression ratio : CR = (length S)/(length T)

2) Retention ratio : RR = info in S / info in T, where T is the text and S is the summary.

The latter can be judged in four different ways:

1) The expert game: Ask experts to underline or extract the most interesting or informative fragments of the text.

2) The classification game: This is an extrinsic measure

3) The Shannon game: This is an information theory method that relies on the probability of the reader guessing the message, as say p and the measure is –p log p.

4) The question game: This measure approximates the information content of S by allowing how well it allows readers to answer questions drawn up about T

#codingexercise

Big numbers can be formed if numbers in an array are concatenated together. How do you print the minimum concatenated number of a given array? For example, if the input array is {3, 32, 321}, there are six concatenated numbers and the minimum one is 321323.

There are n! permutations for an array of size n. So the brute force approach would be O(n!)

The key observation here is that sorting doesn’t work but if we change the sort rule in such a way that when sorted this special way, their concatenation yields the smallest number.

Void PrintConcatenations(int number[])

{

String strNumber = new String[number.length];

For (int I = 0; I < number.length; i++)

strNumber[I] = number[i].ToString();

Array.Sort(strNumber, new NumericComparer());

For (int I = 0; I < number.length; i++)

Console.WriteLine(strNumber[I]);

}

Class NumericComparer implements IComparer<String> {

Public int Compare(String num1, String num2) {

String str1 = num1 + num2;

String str2 = num2 + num1;

Return str1.CompareTo(str2);

}

#codingexercise

If two English words have the same characters and the occurrence number of each character is also identical respectively, they are anagrams. The only difference between a pair of anagrams is the order of characters. For example, “silent” and “listen”, “evil” and “live” are two pairs of anagrams. Please implement a function to verify whether two words are a pair of anagrams.

bool IsAnagram(string one, string two)
{
if (one.length == two.length) return false;
var h = new HashTable();
for (int i =0; i<one.length;i++)
InsertHash(ref h, one[i]);
for (int i = 0; i < two.length; i++)
if (DecrementHash(ref h, two[i]) == false)
return false;
return h.IsEmpty();

}

#codingexercise
Given two strings, how do you delete characters contained in the second string from the first string? For example, if all characters in the string “aeiou” are deleted from the string “We are students.”, the result is “W r stdnts.”.
Solution:
Void DeleteCharacters(String input, String characters)
{
Var h = new HashTable();
For (int I =0; I < characters.length; i++)
                H[characters[i]] = 1;
String ret = string.empty;
For (int r = 0; r < input.length; r++)
     If (h[input[r]] != 1)
                Ret += input[r];
Return ret;

}

Next we continue our discussion of Text Summarization by Hovy. We now look at the other two stages of Interpretation and generation.
Topic fusion or interpretation as mentioned earlier is what sets apart summarization systems from abstract type systems. During summarization, the topics identified as important are fused, represented in new terms even using concepts or words not found in the original text. No system can perform interpretation without prior knowledge about the domain. At first glance, template representation is used in information extraction is promising but the difficulty of building and filling such structures is makes large scale summarization impractical at present.
Hahn and Reimer 1999 develop operators that condense knowledge structures in a terminological logic through conceptual abstraction. Till date no parser has been built to produce the knowledge structures.
Interpretation remains blocked by problems of domain knowledge acquisition.
The third major stage in summarization is generation. When the summary content has been created through abstracting and/or information extraction, it exists within the computer as internal notations and thus requires the techniques of natural language generation.

#codingexercise
How to generate Sudoku puzzles for a puzzle book: http://1drv.ms/1K2Yeao