Cluster computing

Monday, October 22, 2018

We were discussing search with object storage. The language for the query has traditionally been SQL. Tools like LogParser allow sql queries to be executed over enumerables. SQL has been supporting user defined operators for a while now. These user defined operators help with additional computations that are not present as builtins. In the case of relational data, these generally have been user defined functions or user defined aggregates. With the enumerable data set, the SQL is somewhat limited for LogParser. Any implementation of a query execution layer over the object storage could choose to allow or disallow user defined operators. These enable computation on say user defined data types that are not restricted by the system defined types. Such types have been useful with say spatial co-ordinates or geographical data for easier abstraction and simpler expression of computational logic. For example, vector addition can be done with user defined data types and user defined operators.
Aggregation operations have had the benefit that they can support both batch and streaming mode operations. These operations therefore can operate on large datasets because they view only a portion at a time. Furthermore, the batch operation can be parallelized to a number of processors. This has generally been the case with Big Data and cluster mode operations. Until recently, streaming mode operations were not so common with the big data. However, streaming conversions of summation form processing in batch data is now facilitated directly out of the box from streaming algorithm packages from some public cloud providers. This means that both batch and stream processing can operate on unstructured data. Some other forms of processing are included here.
The language may support search expressions as well as user defined operators. Structured query language works for structured data. Unstructured documents are best served by search operators and expressions. Examples include the search expressions and piped operations used with logs. With the broader umbrella inclusion of statistical and machine learning packages, universal query language is now trying to broaden the breadth of existing query language and standardize it.

Cluster computing

Monday, October 22, 2018

No comments:

Post a Comment