Cluster computing

Monday, April 25, 2022

Improving queries part 9

This is a continuation of the best practice in writing Kusto queries as introduced here.

and continued with the log queries for Analytics such as with Azure Monitor Logs. While part 5, 6, and 7 recognized optimizations, best practices and the management aspects of Kusto queries, this discusses expressions which come useful in log queries that span across workspaces and apps. Such queries give a system wide view of the data. The following section describes the translations between SQL queries and Kusto.

Kusto supports a subset of SQL language. These translations do not necessarily need to be worked out. The Kusto operator EXPLAIN when prefixed to a SQL statement translates it to the corresponding Kusto query.

Some operators are now called out with tips for best performance. Selection of data from a SQL table usually involves a row set or a column set. These can be correspondingly called in KQL with the where filter and the project operator for selecting columns. The rules referred to in the earlier parts of this article about aggressively reducing the size of the data holds.

The popular test for data to be null can be tried out with isnotnull(operator). Data checking must be towards the last after the query has utilized the schema to narrow down the result-sets.

The comparisons with date are simplified with the ago(1d) like operators. The datetime range can be described with the <= and >= operators. Time part of the queries must be applied earliest in the set of filters.

The test for data to have literals and patterns also have equivalents with the has, contains and approximation operators.

The grouping operations are substituted with the summarize operator. The dcount(type) operator uses hyperloglog algorithm. These can be computationally expensive so the data spread can be reduced whenever possible.

The column aliases and calculations are made possible with the Extend operator.

The ordering operators and behavior remain similar.

Packing and unpacking data between json and table is made possible with the mv-extend, bag_remove_keys and extractjson operators.

Convert to json:

| extend PackedRecord = pack_all()

| summarize Result = make_list(PackedRecord)

And back with:

let result =

print myDynamicValue = dynamic(<jsonArray>

)

| mvexpand myDynamicValue

| evaluate bag_unpack(myDynamicValue);

result

The jsonpath notations are supported by the ‘$’ for root object, ‘.’ or ‘[]’ for child object and [] for array subscript.

Reference: Sample queries

Sunday, April 24, 2022

Improving queries part 8

This is a continuation of the best practice in writing Kusto queries as introduced here.  

and continued with the log queries for Analytics such as with Azure Monitor Logs. While parts 5, 6, and 7 recognized optimizations, best practices, and the management aspects of Kusto queries, this discusses expressions that come useful in log queries that span across workspaces and apps. Such queries give a system-wide view of the data.

When the data is stored in multiple workspaces and apps, the log queries can either specify the workspace and app details or they can use resource-context queries and the query in the context of a specific resource. It is important to remember that the limits of a hundred resources and workspaces apply to a single query. Workspaces can be referred to by their resource name, qualified name, workspace ID, and Azure Resource ID.

The app() expression is used to refer to a Workspace-based application insights resource and is used to retrieve data from a specific application in the same resource group, another resource group, or another subscription. An App can show all the requests it has received as well as heartbeats.

The workspace() expression is used to refer to a workspace in the same resource group, another resource group, or another subscription. Read access to the workspace is required.

The resource() expression is used in the Azure Monitor query() scoped to a resource to retrieve data from other resources. The URL path for the resource from the Azure Portal can be used to refer to the resource and the same holds for the resource group. Read access to the resource is required and multiple resources can be queried.

The Application Insights and Log Analytics workspace schema differences show between their properties. UserId, AppId, and name will be found in both.

Availability count, type, duration, message, run location, id, name, and timestamp in Log Analytics have a corresponding itemCount, duration, message, location, id, name, and timestamp in Application insights, resource properties. Browser, city, client, and country pertaining to the client are prefixed by the ‘client_’ in the Application Insights resource properties. Custom events count, dimensions and name also have an equivalent property without referring to by the ‘custom’ prefix. The device model and device type are available as client_model and client_type. Exception count, HandledAt, Message, and Type have equivalent in the latter. operation and OperationName are similarly named. Pageview count, duration, and name have their pageview prefix dropped. Similarly, Request count, duration, id, name, and success have their prefix dropped. The role, RoleInstance, and SessionID are also similar. SessionId, source system, telemetry type, URL, and user accountID also have underscored equivalents.

Azure Monitor supports cross-service queries between Azure Data Explorer, Application Insights, and Log Analytics. This enables a Kusto cluster and database to be available to Log Analytics/Application Insights tools and refers to it in a cross-service query. The .show functions, .show function {functionName} and .show database {DatabaseName} schema as JSON commands are all supported with cross-service queries. This is a really powerful feature because it brings external data sources directly into the tool.

Reference: Sample queries   

Saturday, April 23, 2022

Improving queries part 7

This is a continuation of the best practice in writing Kusto queries as introduced here. 

and continued with the log queries for Analytics such as with Azure Monitor Logs. While the part 5 and 6 recognized optimizations and best practices, this discusses the management aspects of Kusto queries.

Kusto databases and clusters pose similar challenges to administration and management as traditional databases. This section enumerates some of the commands available to Database Admins and Monitors that explore usage, track operations and investigate ingestion failures.

The .show queries returns a list of queries that have reached a final state. This is available to users who are not administrators as well, but they can only see the queries that they have launched.

The .show running queries command returns a list of currently executing queries. This can show queries invoked by all users.

The .cancel query command starts a best effort attempt to cancel a specific running query. A reason phrase can be optionally specified.

The .show commands lists the admin commands that have reached a final state. These commands are available to query for 30 days. The listing includes cpu usage as well as resource utilization for all completed commands.

The .show commands-and-queries combines the information for both commands and queries.

The .show journal displays the history of metadata operations. The metadata operations can result from a control command that a user executed, or the system invoked. These have consistent and well formatted data with very few variations between versions. Metadata changes are stored specific to databases on the cluster.

The .show database <databaseName> journal command returns journal for the specific database.

The .show operations displays administrative operations both running and completed, since Admin node was last elected.

The .show operations command returns a table with all administrative operations which were executed in the last two weeks. It displays both running and completed operations. It may be run in two modes. The first mode is the log mode in which all entries in the logs that the user has access to are returned. The second mode is the Latest update mode the last update for each operation ID provided by the user is returned. The first mode is used when the operation ID is not available, and the second mode is used when the operation ID is available.

The .show failed ingestion failures returns a result set that shows failure specific to the data ingestion control commands and encountered by the Kusto data engine.

Ingestion metrics and diagnostic logs included errors that are encountered outside of the database engine.

Frequently some of these queries are executed together for specific diagnostics and troubleshooting cases. They have not been compiled into a published dscript-like library because they change hands or are lost in silos. A sample collection of these libraries from forensics could pave the way for onboarding any new team.

Reference: Sample queries 

Friday, April 22, 2022

Improving queries part 6

This is a continuation of the best practice in writing Kusto queries as introduced here. 

and continued with the log queries for Analytics such as with Azure Monitor Logs. While the part 5 recognized optimizations and best practices, this discusses parsing and other routines along with performance indicators.

Parsing text data in Azure Monitor Logs is required when there are many pieces of information embedded in a single property. When the data is parsed into multiple properties, a single log entry can participate better in search and aggregation as they target the individual properties.

There are several methods and best practices in parsing text data. These methods vary when the parsing is done at ingestion time or when collected at query time:

Parsing data at collection time allows us to configure Custom Fields that create new properties in the table. Queries do not have to include any parsing logic and simply use these properties as any other columns in the table.

Advantages to this method are that it is easier to query the collected data since we don’t need to include parse commands in the query and it is better for query performance since the query does not need to do it.

The disadvantages are that it must be defined in advance, it can’t change the parsing logic and it will only apply to new data. There are fewer parsing options than available in queries, increases latency time for collecting data and errors can be difficult to handle.

Parsing at query time prevents the modification to the original table.

The advantages to this method are that it applies to any data, including data that has already been collected but it makes the queries more complex.

Query performance indicators are available for each query that is executed. These include:

Total CPU which reflects the overall compute is used to process the query even across all compute nodes It represents time used for computing, parsing, and data fetching. All of these actions can take time during query execution but some can be amortized for later or done upfront once.

Data used for processed query which reflects the overall data that was accessed to process the query. It can change with the size of the target table, time span used, filters applied, and the number of columns referenced. Reducing the data aggressively at every stage of the processing can tremendously improve performance benefits.

Time span of the processed query which reflects the gap between the recent and the earliest data that was accessed to process the query. Time range can also be specified for the query and it is usually better to specify the time span as the first predicate.

Age of processed data which reflects the gap between now and the oldest data that was accessed to process the query. It highly influences the efficiency of data fetching and should be specified as the first predicate whenever possible.

Number of workspaces which reflects a count of workspaces read because some not have been intended but were still part of the query execution.

Number of regions: which reflects the number of regions accessed during the query processing resulting from the workspaces. These multi-region queries are much less efficient and performance indicators present partial coverage.

Parallelism: The degree of parallelism indicates how much work can happen in parallel. This is appealing to queries that have high CPU consumption and it can vary with functions and operators.

One of the examples we can use with this query is the following:

Heartbeat

| where timestamp > ago(1h)

| where EventData !has "%SYSTEM32"

| where EventData has “success”

Reference: Sample queries