Improving queries part 5

This is a continuation of the best practice in writing Kusto queries as introduced here.

This section focuses on the log queries for Analytics such as with Azure Monitor Logs. Specifically, it refers to the Azure Monitor Logs which are queries written using the same Kusto Query language.

Statements not supported in the Azure Monitor include Alias and Query parameters which are used to send Kusto queries. These were used to prevent SQL Injection attacks but in the context of logs, these are irrelevant. On the other hand, we have new operators for Azure Monitor Logs that include app(), resource() and workspace()

App() retrieves data from specific application insights app and is in the same resource group, another resource group or another subscription. This can help query data across multiple applications in an Application Insights query.

The resource() operator is used to retrieve data from a specific resource as identified by its resource id. Read access is required to the resource and the query will run scoped to this resource.

The Workspace() operator is used to retrieve the data from a specific workspace in the same resource group, another resource group, or another subscription. This can also be used across multiple workspaces in a log query.

Log queries deal a lot with literals. Some of these may have special characters including quotes. These must be escaped.

Some columns are available as standards and these can be leveraged in the queries. Columns such as TenantId, TimeGenerated, _TimeReceived, Type, _ItemId, _ResourceId, _SubscriptionId, _IsBillable, _BilledSize come useful to filter the results, transform into expressions and to summarize the results with the corresponding KQL operator.

Audit queries are not usually included in the logs. These must be enabled from the portal.

Some optimization might be required for log and audit queries due to the size of the data.

Optimized queries will run faster, reduce overall duration of the query execution and will have a smaller chance of rejection. The queries that are recurrent or occurring-in-burst usages such as dashboards, alerts, Logic Apps, and Power BI are most commonly the candidates for optimization.

The query performance monitor can give significant information on Total CPU, Data used for processed query, Time span of the processed query, Age of the processed data, Number of workspaces, Number of regions, and Parallelism.

The filtering of records prior to the usage of compute intensive operators or functions is preferred.

The use of evaluated where clause as opposed to the use of the actual data is discouraged.

Aggregation commands and dimensions in summarize and join must be effective for use because some of them like dcount can use HyperLogLog algorithm. The Join and the summarize commands may be compute intensive for large data. For example, Heartbeat | where Computer startsWith “Production” | summarize count() by ComputerIP is more efficient than specifying extended columns or using contains operator.

Full XML and Json parsing can also be avoided to reduce the CPU usage for the operators.

This holds true for all data in fact since the size of data passed from pipe operator can be quite large.

The unnecessary use of search and union operators must be avoided for similar reasons and filtering can be applied early in the queries to reduce the data size.

Reference: Sample queries

Cluster computing

Thursday, April 21, 2022

Improving queries part 5

No comments:

Post a Comment