Cluster computing

Monday, September 11, 2017

Today we continue reviewing U-SQL.It unifies the benefits of SQL with the expressive power of your own code. This is said to work very well with all kind of data stores – file, object and relational. U-SQL works on the Azure ecosystem which involves the Azure data lake storage as the foundation and the analytics layer over it. The Azure analytics layer consists of both HD Insight and Azure data Lake analytics (HDLA) which target data differently. The HDInsight works on managed Hadoop clusters and allows developers to write map-reduce with open source. The ADLA is native to Azure and enables C#, SQL over job services. We will also recall that Hadoop was inherently batch processing while Microsoft stack allowed streaming as well. The benefit of the Azure storage is that it spans several kinds of data formats and stores. The ADLA has several other advantages over the managed Hadoop clusters in addition to working with a store for the universe. It enables limitless scale and enterprise grade with easy data preparation. The ADLA is built on Apache yarn, scales dynamically and supports a pay by query model. It supports Azure AD for access control and the U-SQL allows programmability like C#.
U-SQL supports big data analytics which generally have the characteristics that they require processing of any kind of data, allow use of custom algorithms, and scale to any size and be efficient.
This lets queries to be written for a variety of big data analytics. In addition, it supports SQL for Big Data which allows querying over structured data Also it enables scaling and parallelization. While Hive supported HiveSQL and Microsoft Scoop connector enabled SQL over big data and Apache Calcite became a SQL Adapter, U-SQL seems to improve the query language itself. It can unify querying over structured and unstructured data. It has declarative SQL and can execute local and remote queries. It increases productivity and agility It brings in features from T-SQL, Hive SQL, and SCOPE which has been Microsoft's internal Big Data language.U-SQL is extensible and it can be extended with C# and .Net
If we look at the pattern of separating query from data source, we quickly see it's no longer just a consolidate of data sources. It is also pushing down the query to the data sources and thus can act as a translator. Projections, filters and joins can now take place where the data resides. This was a design decision that came from the need to support heterogeneous data sources. Moreover, it gives a consistent unified view of the data to the user.

Courtesy : U-SQL slideshare
#codingexercise
given two strings find if one string is a subsequence of another
solution:
check the lengths of the two string.
If the subsequence candidate string is of length zero return true
if the input to be matched with is of length zero, return false
If the last character of both strings match, check recursively with decremented length in both
else check recursively with decremented length in input.

Cluster computing

Monday, September 11, 2017

No comments:

Post a Comment