Tuesday, September 26, 2017

Today we continue reviewing the whitepaper - the chutes and ladders of customer identity from Gigya which is a leader in Customer Identity and Access Management Industry. The title uses an analogy of a well-known game. This is a whitepaper that introduces the delights and the pitfalls in customer experience with identity.
The chutes for customer identity are demonstrated by
1) annoying ads for past purchases
2) spamming of email inbox with unwanted newsletters
3) more setup and initiation activities prior to transaction
The ladders for customer identity are demonstrated by
1) personalization of ads done right
2) less frequent and more pertinent notifications
3) more integrated and seamless experience with less chores
Most Customer Identity and Access Management solutions need to track customers. They can only do this with user consent. While some show all the terms and conditions up front for a one-time user but overwhelming consent request, others choose to ask as and when needed providing a lighter touch
Similarly the sign up process can seem to require all day to get all the details fed in to the system while others merely refer to existing registered partner sites such as signing into email or chat applications. Removing this password requirement is touted as one of the best improvements to customer experience and consequently a lot of attention has been paid by the industry in forming the OpenID standard. What the standard leaves unsaid is how the marketers can use it to their advantage while it focuses on the integration between businesses and stores for the customer.  A marketer would like to know :
whether the customer arrived via search, campaign or referral
what device they connected with
what was the tipping point that converted them to a customer
what transactions the customer attempted or executed
what are the recommendations that will interest the customer the most
how to make more money from an engaging and mutually beneficial experience for the customer.
what partners and associates are willing to work with the marketers for this customer

#codingexercise
Get the max sum level of an integer binary tree.
We perform level wise traversal of the binary tree and serialize the integers encountered.
Then for every level, we add up the sum and return the level with the highest sum.
The level wise traversal is done by enqueueing the left and the right siblngs in a queue. The levels are demarcated by level indicators which in our case could be a null node.

Monday, September 25, 2017

Today we start reviewing the whitepaper - the chutes and ladders of customer identity from Gigya which is a leader in Customer Identity and Access Management Industry. The title uses an analogy of a well-known game. This is a whitepaper that introduces the delights and the pitfalls in customer experience with identity. The narration is about an online store with the recognition that many of the points mentioned in the white paper apply equally well for other businesses. Most customers start out on an online store as an unknown visitor.  Businesses rely on sending the right message to the right person at the right time and through the right channels. This multi-channel marketing is a ladder in customer experience. Since they start out from a variety of devices and applications, it is difficult to know what they want. Identity and preferences help mitigate this barrier. Some examples of their interactions that are either annoying and frustrating or enjoyable and satisfying are mentioned here.  A customer may buy a pair of jeans a few weeks earlier but may continue to see unwanted ads for the same jeans all over the internet. On the other other hand, the same customer may see recommendations for other items that pair well with the jeans that is much like a personal fashion consultant.  Another example is that of subscription to the newsletters from the online store that become difficult to manage. On the other hand, a crisp and clear infrequent newsletter may give relevant information and improve the experience. Similarly, when the user switches devices, he may lose the settings he had made earlier.  On the other hand, his login may be integrated with social networking apps and the settings become part of public profile to easily follow the user between apps and devices. Therefore there is a trade-off in the customer touchpoints that cam attract or spurn the customer. If the business were to capture information about the user as early as possible and in small pieces, a relationship can be established based on the users preferences. This approach is known as "progressive identity".  It offers promotions, coupons or newsletters to the customer without requiring full registration. It obtains consent from the customer throughout their life-cycle. It enables convenient and centralized profile and preference management. This progressive identity is the notion that a customer is not a boolean known or unknown for a store but an accumulation of transparent and engaging experiences. Customer tracking is required to personalize the customer's experience but instead of displaying all the terms upfront, if it can be managed as the customer progresses on the website, it can improve the experience. A lightweight registration such as an email address in exchange for a valuable and personalized content  will be preferred by the customer. The customer will be enticed from the content to sign up for a full account. The easier we make this registration and the more augmented the authentication methods, the better the customer experience.
#codingexercise
Convert a BST to min heap
Traverse the BST in InOrder to get a sorted list
Heapify the sorted list, say one-based, using
the left of an element at index i-1 in the sorted list is at 2i th index if that index is within range
the right of an element at index i-1 in the sorted list is at 2i+1 th index if that index is within range

Sunday, September 24, 2017

We continue to review the slides from Stanford that introduce Natural Language Processing via Vector Semantics.We said that vector representation is useful and opens up new possibilities. We saw that a lookup such as a thesaurus does not help.  We were first reviewing co-occurrence matrices. These are of many forms such as term-document matrix, word-word matrix, word-context matrix etc The term-document matrix was  a count of word w in a document d. Each document therefore becomes a count vector. The similarity between the words in this case merely indicates their occurrence to be similar. If we changed the scope from documents to some other text boundary, we have word-word matrix.  The similarity in this case improves over that in the term-document matrix. A word-context matrix improves this further because the word in terms of context which is closer to its meaning and bring semantical similarity.
 Instead of co-occurrence, we now consider a different correlation between two words. It's called the positive pointwise mutual information.  Pointwise mutual information indicates whether two words occur more than they were independent.It is defined as the log of their probability taken together divided by their individual probabilities. This value can come out to be negative or positive. The negative values are not helpful because the values are of little significance. The probabilities are small and the result is in the inverse order of powers of ten which do not give any meaning via significance. Unrelatedness is also not helpful to our analysis. On the other hand, positive PMI is helpful to discern whether two words are likely together. we only have to take the PMI if it turns out to be positive. Computing the PPMI is easy to do with the probabilities which are based on cumulatives of word - occurrences
Today we  realize that PMI needs to be weighted, because it is biased towards in-frequent events. If the words are rare, they have a high PMI value. This skews the results when the text has rare words. In order to make it fair, the PMI is weighted. This can be achieved either by raising the context probabilities or with add-one smoothing. The probability of rare context is raised to alpha=0.75 in the first case and an appropriate smoothing may be added to the numerator in calculating the probability and applied to the uniform probability in the second case.
#codingexercise
Find all the heroes and the superheroes in an integer array. The heroes are the elements which are greater than all the elements to the right of them. A superhero is the element which is greater than all the elements to the left and the right.
This problem can be solved by keeping track of the current max seen so far. If the elements traversed and picked as next exceeds the current max, it satisfies the criteria for being the hero and gets used as the current max. Any element that does not exceed the current max is not a hero. The final value of the current max is the superhero.

Saturday, September 23, 2017

We continue to review the slides from Stanford that introduce Natural Language Processing via Vector Semantics.We said that vector representation is useful and opens up new possibilities. We saw that a lookup such as a thesaurus does not help.  We were first reviewing co-occurrence matrices. These are of many forms such as term-document matrix, word-word matrix, word-context matrix etc The term-document matrix was  a count of word w in a document d. Each document therefore becomes a count vector. The similarity between the words in this case merely indicates their occurrence to be similar. If we changed the scope from documents to some other text boundary, we have word-word matrix.  The similarity in this case improves over that in the term-document matrix. A word-context matrix improves this further because the word in terms of context which is closer to its meaning and bring semantical similarity.
Some of these matrices can be very sparse with zeros covering a majority of the cells. This is quite alright since there are lots of efficient algorithms for sparse matrices. Similarly the size of the window can also be adjusted. The shorter the window the more syntactic the representation. The longer the window, the more semantic the representation. Instead of co-occurrence, we now consider a different correlation between two words. It's called the positive pointwise mutual information. Raw word frequency suffered from being skewed by more frequent and less salient words. Pointwise mutual information indicates whether two words occur more than they were independent.It is defined as the log of their probability taken together divided by their individual probabilities. This value can come out to be negative or positive. The negative values are not helpful because the values are of little significance. The probabilities are small and the result is in the inverse order of powers of ten which do not give any meaning via significance. Unrelatedness is also not helpful to our analysis. On the other hand, positive PMI is helpful to discern whether two words are likely together. we only have to take the PMI if it turns out to be positive. Computing the PPMI is easy to do with the probabilities which are based on cumulatives of word - occurrences.
#codingexercise
Count the number of islands in a sparse matrix.
This problem is similar to the one in graph where we find connected components. In a 2d matrix, every cell has eight neighbors. A depth first search explores all these eight neighbors. When a cell is visited, it is marked so it is not included in the next traversal. The number of such successful depth first search results in the number of connected components.

Friday, September 22, 2017

We continue to review the slides from Stanford that introduce Natural Language Processing via Vector Semantics.We said that vector representation is useful and opens up new possibilities. We saw that a lookup such as a thesaurus does not help.
Stanford NLP has shown there are four kinds of vector models.
A Sparse vector representation where a word is represented in terms of the co-occurrences with the other words and using a set of weights for their co-occurrences. This weight is usually based on a metric called the mutual information.
A dense vector representation that involves latent semantic analysis, neural net or clusters from Brown corpus. The dense vector representations share a representation of word as a vector of numbers which translate a word into a corresponding vector in the vector space. This is called embedding.
Co-occurrence matrices were of many forms such as term-document matrix, word-word matrix, word-context matrix etc The term-document matrix was  a count of word w in a document d. Each document therefore becomes a count vector. The similarity between the words in this case merely indicates their occurrence to be similar. If we changed the scope from documents to some other text boundary, we have word-word matrix.  The similarity in this case improves over that in the term-document matrix. A word-context matrix improves this further because the word in terms of context which is closer to its meaning and bring semantical similarity.
Co-occurrence between two words have two forms - first order and second order. The first order co-occurrence is syntagmatic association and the second-order association is paradigmatic association which means the first one is based on positions  where as the second one is based on similar neighbors. Note that the vectorization derives from the usage of words which is why it becomes popular. Another way to look at usage is to canonicalize the text into an esperanto language where the relations and syntax are more oriented towards natural language processing. Some work has already begun with different kind of improvements to ontologies that are not restricted to thesaurus or wordnet but one such as FrameNet. All we need to keep in mind here is that there are layers to tackle the problem - Usage, vector space, classification of vectors. 

Thursday, September 21, 2017

We continue to review the slides from Stanford that introduce Natural Language Processing via Vector Semantics.We said that vector representation is useful and opens up new possibilities. We saw that a lookup such as a thesaurus does not help.
Stanford NLP has shown there are four kinds of vector models.
A Sparse vector representation where a word is represented in terms of the co-occurrences with the other words and using a set of weights for their co-occurrences. This weight is usually based on a metric called the mutual information.
A dense vector representation that takes one of the following vector models:
A representation based on weights associated with other words where the weights are computed as using conditional probabilities of the occurrences and referred to as latent semantic analysis
A neural network based models where the weights with other words are first determined by predicting a word based on the surrounding words and then predicting the surrounding words based on the current word
A set of clusters based on the Brown corpus.
#codingexercise
Find the minimum number of squares whose sum equals to a given number n
We write a few base cases say upto n = 3
For the n greater than that, we can initialize the number of squares to be the candidate we consider from 4 to n. Each number can be represented with the maximum number of squares as those comprising of unit squares only.
Next for each number from 1 to that candidate, we can recursively calculate the maximum number of squares for the n minus the square of the iterator and incrementing one towards the count. We update the minimum as we find for each iterator. All the results are memoized for easy lookup. This results in the smallest number of squares being found in the table entry for n.

Wednesday, September 20, 2017

We continue to review the slides from Stanford that introduce Natural Language Processing via Vector Semantics.We said that vector representation is useful and opens up new possibilities. For example, it helps compute the similarity between words. "fast" is similar to "rapid", "tall" is similar to "height" This can help in question answering say as in How tall is Mt.Everest ? The height of Mt.Everest is 29029 feet. Similarity of words also helps with plagiarism. If two narratives have a few words changed here and there, the similarity of the words should be high because they share the same context. When a number of word vectors are similar, the overall narrative is plagiarized.
Word vectors are also useful when the semantics of the word change over time. Words hold their meaning only in context of the surrounding words.If their usage changes over time, their meaning also changes. Consequently, word similarity may change based on their context. The problem with using a thesaurus in this case is that the thesaurus does not exist for every year to determine the similarity between the words which mean something today and meant something else yesterday. Moreover, thesaurus unlike a dictionary does not contain all words and phrases particularly verbs and adjectives.
Therefore instead of looking up an ontology, we now refer to a distributional model for the meaning of word which relies on the context surrounding the given words. A synonym is therefore a choice of words that share the same context and usage. In fact we interpret meanings of unknown words by look at the surrounding words and their context.
Stanford NLP has shown there are four kinds of vector models.
A Sparse vector representation where a word is represented in terms of the co-occurrences with the other words and using a set of weights for their co-occurrences. This weight is usually based on a metric called the mutual information.
A dense vector representation that takes one of the following vector models:
A representation based on weights associated with other words where the weights are computed as using conditional probabilities of the occurrences and referred to as latent semantic analysis
A neural network based models where the weights with other words are first determined by predicting a word based on the surrounding words and then predicting the surrounding words based on the current word
A set of clusters based on the Brown corpus.
#codingexercise
Find the maximum water retention in a bar chart
Water is retained over a bar of unit length and between the left and the right bars upto a depth equal to the difference between the minimum of the left and right and the height of the current bar. Therefore for each bar we can find the max on the left and on the right and calculate the water retained as above. We then cumulate this water retained for each bar along the range of bars. Since we need to find the max on the left and on the right for each bar, we can do this in two separate passes over all the bars.