Cluster computing: Kusto queries in Azure public cloud

Wednesday, June 30, 2021

Kusto queries in Azure public cloud

Kusto queries in Azure public cloud.

Query language become as popular as shell scripting for data scientist and analytics professionals. While SQL has been universally used for a wide variety of purposes, it was always called out for its shortcomings with respect to features available in shell scripting such as data pipelining and a wide range of native commands that can be used in stages between an input and a desired output. Pipelining is the term referring to the intermediary stages of processing where the output from one operator is taken as the input to another. PowerShell scripting bridged this gap with a wide variety of cmdlets that can be used in pipelined aka vectorized execution. While PowerShell enjoys phenomenal support in terms of the library of commands from all Azure services, resources, and their managers, it is not always easy to work with all kinds of data and makes no restrictions to the scope or intent of the changes to the data. Kusto addresses this is specifically by allowing only read operations on the data and requiring the use of copying into new tables, rows, and columns for any editing. This is the hallmark of Kusto’s design. It separates the read only queries from the control commands that are used to create the structure to hold intermediary data.

Kusto is popular both with Azure monitor as well as Azure data explorer. It is a read only request to process data and returns results in plain text. If uses a data flow model that is remarkably like the slice and dice operators in the shell commands.IT can work with structured data with the help of tables, rows, and columns but it is not restricted to schema-based entities. It can be applied to unstructured data such as telemetry data. It consists of sequence of statements delimited by semicolon operator and has at least one tabular query operator. The name of a table is sufficient to stream the rows to a pipeline operator that separates the filtering into its own stage with the help of a SQL like where clause. Sequences of where clauses can be chained to result in a more refined set of resulting rows. It can be as short as a tabular query operator, a data source, and a transformation. Any use of new tables, rows and columns require the use of control commands that are differentiated from Kusto queries because they begin with a dot character. The separation of these control commands helps with security of the overall data analysis routines. Administrators will have less hesitation for Kusto queries to run on their data. Control commands also help to manage entities or discover their metadata. A sample control command is a “.show” command that shows all the tables in the current database.

One of the features of Kusto is its set of immediate-visualization chart operators that render the data in a variety of plots. While the scalar operators could summarize rows of data and were quite popular already with their equivalence to shell commands, the ability to project tabular data and summarize before pipelining to a chart operator makes it even more data friendly and popular to data scientists. These visualizations are not restricted to the timecharts and can include multiple-series, cycles, and data types. A join operator can combine two different data sources to make it more convenient to plot and visualize the results. Distributions and percentiles are easy to compute and often required for time slice windows involving metrics. Results can be assigned to variables with the help of let command and data from several databases can be virtualized behind a Kusto query.

Cluster computing

Wednesday, June 30, 2021

Kusto queries in Azure public cloud

No comments:

Post a Comment