Cluster computing

Tuesday, July 14, 2015

Today we continue writing the review of the book "Little bets". In investigating what facilitates the successful practice of little bets, a certain way of thinking about failure plays an important role. Successful experimental innovators tend to view failure as both inevitable and instrumental in pursuing their goals. Therefore they exhibit a growth mindset and tend to seek activities that expand their abilities. For example, Pixar's top managers demonstrate a relentless desire to challenge and learn, and they ensure that this trickles throughout the rest of the company.Ed Catmull of Pixar describes the creative process there as going from suck to nonsuck. The author also interviewed Frank Gehry who is a well known architect and demonstrates growth mindset. He said he doesn't feel he is at a mountaintop but is still continuing to get there. Practicing little bets frees us from the expectation that we should know everything we need to know before we begin. By focusing on doing rather than planning and learning the pitfalls of ideas, we demonstrate a growth mindset. Pixar relies on a concept called plussing. The idea of plussing is to build upon and improve ideas without using judgemental language. Creating an atmosphere where ideas are constantly being plussed, maintains a sense of humour and playfulness. Successful humour breaks down power structures that tend to inhibhit tighter social bonds and interactions.
The author extolls constraints in any situation as guardrails that not only shape and focus problems but also provide clear challenges. Bing Gordon, the founder of Electronic Arts found it useful to break down a project into relatively small problems - a method he called smallifying. This practice has now been widely embraced in the software industry as agile software development. Similarly Yunus the well known Economics professor absorbed himself in the lives of the poorest before discovering a core problem that economists had ignored.
Learning a little from a lot and a lot from a little are some of the insights that comes with this creativity. Dr.Wiseman found that people who considered themselves lucky in essence created their own luck by interacting with a larger group and getting more insights and opportunities. Similarly MIT Professor Von Hippel found that there are cutting edge users of ideas who form a good pilot group to test little bets which make them a sure win over a bigger audience. When the audience gives a muted chorus of chuckles, Chris Rock knows that he's found a theme that has the makings of a good joke.
Starbucks evolved in a similar manner. It emerged by carefully adapting to customer feedback. Schultz initially was against non-fat milk but when customers kept requesting non-fat milk, he relented. The success of those drinks become an important small win. Non-fat milk would grow to almost half of Starbucks' lattes and cappuccinos.
Thus experimental innovators use strikingly similar methods inside their work processes.

Monday, July 13, 2015

Today we write a summary of the book "Little Bets" by Peter Sims. This book is about an approach that many successful professionals take : methodically taking small experimental steps.From Beethoven to Edison to Bezos, these achievers practice a set of simple but often counterintuitive experimental methods - often failing quickly to learn, trying imperfect ideas and engaging in highly immersed observation. These methods unshackle them from conventional planning, unexpected connections and perceiving invaluable insights.
This book tells us
how to embrace failure as a critical step towards success
harness your curiosity with examples from successful companies
steps to use little bets for possibly bigger outcomes
leverage small wins to improve performance goals
The book mentions examples of how comedian Chris Rock experiments many ideas before a small audience before making the final cut. Bezos champions developing ideas in new markets to "planting seeds" and "going down blind alleys". At the core of this approach, little bets are concrete actions taken to discover, test and develop ideas that are achievable and affordable.They begin as creative possibilities that get iterated and refined over time. This is an approach that anyone can take.
The steps are
experiment - learn by doing and failing quickly to learn fast
play - a playful and humorous atmosphere relaxes inhibitions
immerse - gather fresh ideas and insights
define - use insights gathered to define specific problems
reorient - be flexible in pursuit of larger goals
iterate - repeat, refine and test frequently
As an example. HP introduced the scientific calculator as a little bet. Today they are ubiquitous.
Prof.Saras Sarasvathy says there are two advantages - it enables us to focus on what we can afford to lose rather than make assumption of what we can gain and secondly it develops the means as we progress. Determining what he can afford to lose is what Chris Rock does when going before audiences with rough material. Throughout Pixar's creative process, they rely heavily on what they called plussing - to build upon and improve ideas without using judgemental language. What Chris Rock and Pixar animations show is that they use little bets to discover, test and develop ideas that are achievable. These examples goto show that little bets can improve the overall progress.

#codingexercise
Int [] Merge ( int [] sorted1, int [] sorted2)
{
Int I =0, j = 0;k=0;
Int m = sorted1.length;
Int n = sorted2.length ;
Var results = new int [sorted1. Length +sorted1.length.length];
While ( i < m && j < n)
{
If sorted1 [i] < sorted [2] {
i++;
Results [k] = sorted1 [i]}
Else{
Rsults [k] = sorted2[j];
J++;
}
K++;
}
While (I < m){
Results [k] = sorted [i];
I++;
K++;}

While j < n{
Results [k] = sorted2 [j];
J++;
K++;}
Return results;
}

Int[] Merge (int [] sorted1, int [] sorted2)
{
Var results = new List <int>();
Int m=0,n =0;
For ( int I = 0; I < sorted1.length + sorted2.length; i++){
If (sorted1 [m] < sorted2[n])
results [k] == sorted1 [m];
m++;
}
Else {
Results [k] == sorted2 [n];
n++;
}
Return results.toArray ();
}

Sunday, July 12, 2015

Today we discuss a book review of "Data-ism" by Steve Lohr - a Harper Collins publication. Technology has an unimaginable capacity to generate data. This raises concerns about the idolatry of data and the consequent replacement of wisdom with quantification - a phenomenon named post-humanist. The author includes description of the possibilities of big data.
He starts off with an example of McKesson corporation which distributes one third of all pharmaceuticals in the US to 26000 customer locations including roughly 240 million pills a day.
The author takes this example because the company accumulates data (pills, prices and shipment miles) which is plentiful, stable and reliable. This has enabled IBM to build a sort of flight simulator for decision making. Presumably the author is talking about online data and data warehousing and goes to add that this capability works in two ways - one to provide profit and loss figures for every product, supplier and another to provide a tool for analytics and prediction.
An outcome of such analytics has been to centralize the distribution of very expensive drugs. Centralization requires expensive air shipments to customers. However, IBM's modeling software predicted that savings in inventory levels for certain drugs would make up the higher air freight expenses. McKesson tested this with a pilot project and the proposal was vindicated. The software gave McKesson the clarity and confidence to go ahead. This is an example where quantification trumps "best guesses, gut feel, experience and intuition". This replacement of "wisdom" can cause concerns.
The author can be credited with not only highlighting the benefits but as a journalist striving to see all sides of the issues. As an example, he cites Tom Mitchell, the chairman of machine-learning department at Carnegie Mellon who gave an example involving two sentences:
The girl caught the butterfly with the spots.
The girl caught the butterfly with the net.
where humans can understand that girls don't have spots and butterflies don't have nets but machines are missing what can be termed as "context"
The role of context can be understood with say the number 39 which means nothing but add Celsius and it means temperature and hot and add a person's name and it means illness.
Context today is said to be achieved in two ways - correlation and association. Fortunately correlation is not new and data mining has helped here. For example, Walmart found out about ten years earlier that consumers stocked up on strawberry pop tarts and beer before a hurricane.
Another example given is that of Zest Finance that reduces the risk of payday borrowers by including data points such as whether the borrower has a cell phone and if he or she types their names in all upper case. Although it works, not knowing "why" there is a correlation brings up a debate. Lohr says that the authors of Big Data insist that big data overturns the self congratulatory illusion that comes with identifying causal mechanisms.
Data enthusiasts say the why can be answered by pairing models with measurements. However measurements seems to have led us to say the housing financial crisis where data was analyzed that was plentiful and ignored financial crises, when data sets were sparse and messy.
Therefore correlation alone is not sufficient. But intuition has its own problems. The author cites the example from Daniel Kahneman's - "Thinking, Fast and slow" where participants asked if a man described as "meek" will be a librarian or a farmer. While the vast majority replied "librarian", data showed that there are twenty times more farmers than librarians.
Since both man and machine have weaknesses, the authors say that both can help each other remove their blindspots. However, he warns against froming a habit where instead of computers assisting humans, humans may assist computers
Another example of a frontline of this interdependence is a medical center where doctors and data driven software are playing "an information game". Doctors no longer reign supreme over monitoring. Lohr points out that the two can augment each other.

Saturday, July 11, 2015

Today we discuss a book review summary of the book "How the World sees you" by Sally Hogshead with the motto - Discover your highest value through the science of fascination
There is power in understanding how the world sees you : you appear confident, authentic and ready to make a positive impression. This book gives a step by step method to describe yourself in one or two words which then becomes your tag. This book helps you with some of the terms. Your slogan helps you in all walks of life from writing that self-introduction on a new job to pitching your profile on LinkedIn.
This book differs from some self-improvement books in that it does not ask you to change but to be more of who you are. It should help you better relationships and grow your business.
This book focuses on the seven fascination advantages and its discussion.
When we are born, we are automatically fascinating to the world with our gestures but over time we acquire layers of what makes us boring. For example, you don't want to stand out with your ideas or present them as insipid as possible to avoid attracting criticism. We want to become unnoticed when we want to avoid being seen as unworthy. However hiding works temporarily and usually backfires. Sometimes we feel we don't have anything noteworthy but that only dulls our edges.
As conversations become compressed in a more crowded marketplace, you need to know your strength and your differences. We need to know our values and the highest distinct value.
Likability and leadership are improved with fascination advantages. With such an advantage, when you talk, people will listen and remember you and even anticipate your messages.
One example of why this is important is given by the experiment of Joshua Bell. As a famous violinist, he could command a lot of money for his skills. When he agreed to participate in an arranged experiment to play in a subway station, the throng passed by in their rush. This tells us that no matter how good we are, we need to do more to fascinate others in a highly demanding environment.
There are three deadly threats in a competitive environment:
Distraction - that threatens the connection with others. Typically a listener only gives the first nine seconds You have to add value in that golden window
Competition - which threatens your ability to differentiate and win. Customers want to try out something different. And you are already different.
Commoditization - which threatens your relationships and loyalty. This can seep into your relationships and erode your connections to the point of being replaced.
You can triumph by adding distinct value: This makes yobyu become admired for a noteworthy ability and for being worth more than you are being paid. You deliver more than you would normally be expected. You become preferred even if you are more expensive or less convenient.
Highest Distinct Value has a meaning in each word.You discover it by recollecting what it means to be in fascination. Its the state of being completely engaged mentally, emotionally and physically.
The Art of Fascination comes with the triumph of adding distinct value over the threats of distraction, competition and commoditization. The key here is to fully recognize your differences rather than just the strengths. When you communicate, you are either adding value or taking up space. That's why those "Touchpoints" are important. Every touch point should highlight why you are different and better. Maximize value and subtract everything else.
The Seven Fascination Advantages are
1) the power advantage - leading through authority where the power personalities speak the language of confidence. You will take charge of the conversation because you are decisive and self-assured.
2) the passion advantage - you create warm emotional connections. You are approachable and gregarious. Use reassuring terms such as "you bet"
3) the mystique advantage - This is about thinking before speaking. Mystique personalities speak the language of listening. You see nuances and you think things through.
4) the prestige advantage - Achieving success with higher standards You should speak the language of excellence
5) the alert advantage - You use careful precision. You speak details where it matters. They are risk averse.
6) the innovation advantage - You speak the language of creativity. You propose unexpected solutions or come up with a profusion of ideas.
7) the trust advantage - you pay attention to building loyalty over time.
At this point you should list your advantages as primary + secondary to articulate your highest distinct value.

#codingexercise
Given a binary tree implement an iterator that will iterate through its elements.
public interface ITreeEnumerator
{
bool MoveNext();
void Reset();
}

public class TreeEnum: ITreeEnumerator
{
Stack<Node> inorder = new Stack<Node>();
Node current = null;
Node root = null;
public TreeEnum(TreeNode n){
root = n;
Leftmost(n);
}
private void Traverse() {
current = inorder.pop();
Leftmost(current.right);
}
public void Leftmost(Node n){
while (n != null) {
inorder.Push(n);
n = n.left;
}
}
public bool MoveNext() {
if (inorder.IsEmpty() == False) {
traverse();
return true;
}
return false;
}
public void Reset() {
if (root){
Leftmost(root);
}
}
}

Implementation of partition method in quicksilver
int partition ( ref int [] A, int start, int end){
Int I = start - 1;
Int r = end - 1;
For ( int j = start; j < r; j++)
{
If ( A [ j ] < =A [r] ) {
i = i + 1;
Swap ( A[i] , A [j] );
}
}
Swap (A [i+1], A [r]);
return i + 1;
}

Friday, July 10, 2015

Today we discuss a coding problem.
Question: Given a series of numbers from 1 to pow (2, n) on a paper, we fold it in the following way with the help of two commands
L : Fold the paper from left edge to right edge
R : Fold the paper from right edge to left edge
For n = 1 we have numbers 1,2
These can be folded in two ways L & R
L makes the number 1 come on top of 2
R makes the number 2 come on top of 1
And a sequence of alternating L and R.
After n number of commands there will be a number in each layer
We have to print it from top to bottom as
{1,2} in this case.
Solution Note that folding only involves left and right commands
And this is repeated for log N steps
If we maintain a list for each index, folding causes elements to be added onto each index from another index.
In each step these lists on either side of the center come closer until there is only one.
With a data structure as a list of list for current and previous iterations,
We can iterate for log N steps and merge according to whether the command is left or right

// partial code
List<int> Fold(List<int> numbers>)
{
string command = "L";
var curr = new List<List<int>> ();

for (int i = 0; i < numbers.Length; i++)
{
curr.append (new List<int>(){ numbers[i]});
}

For (int i = 0; i < curr.length/2; i++)
{
if (command == "L")
{
for (k = 0; k < curr.length / 2; k++)
Merge(ref curr, k, length-1-k);
LeftShift(curr, curr.length/2);
}
else
{
for (k=0; k < curr.lenght / 2; k++)
Merge(ref curr, length-1-k, k)
RightShift(curr, curr.length/2);
}
}
return curr[0];
}

void Merge(ref List<List<int>> curr, int src, int dest)
{
// source will have its list reversed and pushed at the front of the list on destination
// source list will be set to null
}
LeftShift and RightShift will trim the left and right null lists.

Another question along similar lines is that there are two arrays given and we can sort all the elements of array A in the order of B. Then the remaining elements of A can be sorted in that order.

Question : Given an array where each element is maximum +-k index away from its sorted position, find an algorithm to sort such array.

Solution : We will assume n = number of elements in array is greater than k.

One solution is to use InsertionSort. It will benefit from the property of the given array because it won't have to traverse the entire length of the array in the worst case.

The Insertion sort goes like this :

for int i = 1 to len(A) -1

j = i

while( j > 0 and A[j-1] > A[j])

swap(A[j] and A[j-1])

j = j - 1

Another algorithm cited online was to use a heap of size 2K. As the elements fill up, we could take the minimum and output it. This does not affect the input array.

However this will not work because the heapsize has to be all the elements of the array for this kind of sorting to work because the choice of the first 2k elements does not guarantee that the successor of the next element will be found

Question: Given a string left rotate it by a given number of times.

Answer: Let us first write it down as it is described.

void Rotate(Stringbuilder b, int n)

while (n)

char c = b.removeAt(0)

b.append(c)

n--

return b;

Next the optimal solution would be to split the string at the index and swap the two partitions

Question: Given a number between 0 to 1000, raise 11 to that number and determine the count of times '1' appears in the result.

Answer: Generate the Pascal numbers with levels upto and inclusive of that number. Then for each number appearing at that level test and count the number of ones.

Question: given a string of characters x and o where x and o are interspersed, find the cheapest way to group all the x together. Each movement of a letter is expensive.

Answer: we can find the weighted average of the index by taking a weight of 1 for all x and a weight of 0 for all o. Then we can shift all the x’s closer together to this index.

Question Write SQL to calculate the number of work days between two dates
Answer:
SELECT (DATEDIFF(dd, @StartDate, @EndDate)+1) - (DATEDIFF(wk,@StartDate,@EndDate)*2)
-(CASE WHEN DATENAME(dw, @StartDate) = 'Sunday' THEN 1 ELSE 0 END)
-(CASE WHEN DATENAME(dw, @EndDate)='Saturday' THEN 1 ELSE 0 END)

Question: Design a file search utility
Answer: A depth first traversal with escape for current and parent folder notations.

Thursday, July 9, 2015

Today we discuss MapReduce algorithm in Hadoop with reference to their documentation.
MapReduce processes vast amounts of data in parallel on large clusters(thousands of nodes) in a reliable fault-tolerant manner.
A MapReduce job usually splits the input data set into independent chunks which are then processed in a shared nothing model by map tasks. The outputs of the maps are then input to the reduce tasks. The framework stores the input and output of a job in a filesystem, schedules the processing, monitors the activity and re-executes failed tasks. The framework employs a master slave model for the execution of the map and reduce.
While the jobs are monitored and completed by the framework, their configuration including the input or output locations are specified by the applications.
At each stage of the processing - the map, combine, reduce steps take the <key, value> pairs as the input and output. Both key and value implement the writable interface and the key classes have to implement the writablecomparable interface.

For eg:
public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable>{
public void map (LongWritable key, Text Value, OutputCollector<Text, IntWritable> output, Reporter reporter)
:// eg tokenizes a sentence into words
}
}

public static class Reduce MapReduceBase implements Reducer<Text, IntWriteable, Text, IntWriteable> {
public void reduce (Text key, Iterator<IntWriteable> values, OutputCollector<Text, IntWriteable> output, Reporter) {
:// counts the number of words encountered
}
}
The Mapper splits the line into tokens which the tokenizer emits as a keyvalue pair of the words and the counts.
The Reducer sums up the values which represent occurrences for each key.

Together the mapper and the reducer form the payload to the framework and are specified by the applications.
The Mapper decides on the number of maps based on the input data and blocksize. The Reducer has three primary phases - shuffle, sort and reduce. Shuffle takes the sorted output of all the mappers as input. Reducer inputs are grouped by keys during sort. Both shuffle and sort occur simultaneously as map outputs are fetched. The output of the reducer is not sorted. A partitioner partitions the key space. It controls the partitioning of the keys of the intermediate map-outputs. By default, a hash function is used to derive the partitions. A reporter reports progress and relevant application-level status messages. An OutputCollector collects data output by the Mapper or the reducer.

Wednesday, July 8, 2015

1) Given two strings find the longest common substring. We can use dynamic programming for this.

We define the optimal substructure as follows:

Let S be a string of length m and T be a string of length n. Recall that the pattern matching of strings involved computing a prefix. Here too we find all possible prefixes and taking any two prefixes, we choose the one with the longest suffix.

This can be written as

The longest common suffix =

LCSuffix(S1p-1, T1..q-1)+1 if S[p] = T[q]
0 otherwise

Then we can choose the longest common substring as the max of the longest common suffixes for i ranging from 1 to m and j ranging from 1 to n.

LCSubstr(S,T) = max 1<= i <=m and 1<= j <=n LCSuffix(S1..i, T1..j)

We will investigate suffix tree for solving this. A suffix tree helps arrange the strings as a path from the root to the leaves with the strings denoted by numbers and appearing at the leaves of the suffix tree. The problem then can be translated to one which finds out which strings appear below each node. The tree is traversed bottom up ( which means level order traversal with the results reversed level wise) and a bit vector is maintained for all possible strings. Alternatively the lowest common ancestor of two strings can be found out.

Suffix tree reminds us of a trie. A prefix tree is also a trie. Both have special terminators on the nodes to signify the end of a string or word.

Another interview question is as follows:

Given a csv of name and age of persons, sort it by age. For this we could use a list data structure that already has a method sort and can take an IComparer for duplicates. And duplicate elements can also be stored as lists within the overall list. Of course, .Net also provides a tuple collection.

Another interview question is based on levels of friendship:
Given the input as
A: B C D
D: B E F
E: C F G
This produces the output as
Level 1 : B, C, D
Level 2 : E, F
Level 3 : G

The given example has no cycles in the graph. So starting at the root as the chosen node, we can form a tree and then complete a Level wise traversal to find the next levels of friendship.

However, the problem may not be restricted to acyclic graph even if it is directed. Hence we maintain a queue for each level