Friday, September 15, 2017

Today we continue reviewing U-SQL.It unifies the benefits of SQL with the expressive power of your own code. This is said to work very well with all kind of data stores – file, object and relational. U-SQL works on the Azure ecosystem which involves the Azure data lake storage as the foundation and the analytics layer over it. The benefit of the Azure storage is that it spans several kinds of data formats and stores.

U-SQL like T-SQL provides important benefits with query language. First and foremost, there is consistency and familiarity with its usage. The learning curve and the onboarding from T-SQL to U-SQL is not very steep.  Moreover, there is a lot of thought behind the syntax.  It is context independent and defined in data processing language. It is also composable. It is important for the query language to be precise and accurate while at the same time be convenient for the user. There is writeability versus readability separation of concerns. All of these are important considerations in U-SQL. In addition to the syntax, the semantics also matter. U-SQL tries to avoid surprises and complexities. Moreoever as a language it is composable for the user and optimizable for the system.
Courtesy U-SQL slide shares

My take on query improvements : https://1drv.ms/w/s!Ashlm-Nw-wnWsFqBcG-mBhjPLbC8

#codingexercise
We maintain a hashtable for the N letters of the search string with a linked list of positions of occurrences for each letter. As we scan the containing string, we insert the index into the linked list against that letter in the hash table. 
Next for each position in the min value of the linked lists as a candidate and an initialized max value, and for every other letter in the hash table, we find the positions of the letters in the N that is closest and after the candidate's position and update the max we have found so far. With this max value minus the candidate's position as offset, we can form a tuple of start, offset for every candidate. The tuple that gives the smallest offset is the answer to return. 

No comments:

Post a Comment