Improving
queries part 5
This is a continuation of the best practice in
writing Kusto queries as introduced here.
This section focuses on the log queries for Analytics such as with
Azure Monitor Logs. Specifically, it refers to the Azure Monitor Logs which are
queries written using the same Kusto Query language.
Statements not supported in the Azure Monitor include Alias and
Query parameters which are used to send Kusto queries. These were used to
prevent SQL Injection attacks but in the context of logs, these are irrelevant.
On the other hand, we have new operators for Azure Monitor Logs that include
app(), resource() and workspace()
App() retrieves data from specific application insights app and is
in the same resource group, another resource group or another subscription. This can help query data across multiple
applications in an Application Insights query.
The resource() operator is used to retrieve data from a specific
resource as identified by its resource id. Read access is required to the
resource and the query will run scoped to this resource.
The Workspace() operator is used to retrieve the data from a
specific workspace in the same resource group, another resource group, or
another subscription. This can also be used across multiple workspaces in a log
query.
Log queries deal a lot with literals. Some of these may have
special characters including quotes. These must be escaped.
Some columns are available as standards and these can be leveraged
in the queries. Columns such as TenantId, TimeGenerated, _TimeReceived, Type,
_ItemId, _ResourceId, _SubscriptionId, _IsBillable, _BilledSize come useful to
filter the results, transform into expressions and to summarize the results
with the corresponding KQL operator.
Audit queries are not usually included in the logs. These must be
enabled from the portal.
Some optimization might be required for log
and audit queries due to the size of the data.
Optimized queries will run faster, reduce
overall duration of the query execution and will have a smaller chance of
rejection. The queries that are
recurrent or occurring-in-burst usages such as dashboards, alerts, Logic Apps,
and Power BI are most commonly the candidates for optimization.
The query performance monitor can give
significant information on Total CPU, Data used for processed query, Time span of
the processed query, Age of the processed data, Number of workspaces, Number of
regions, and Parallelism.
The filtering of records prior to the usage of
compute intensive operators or functions is preferred.
The use of evaluated where clause as opposed
to the use of the actual data is discouraged.
Aggregation commands and dimensions in
summarize and join must be effective for use because some of them like dcount can
use HyperLogLog algorithm. The Join and the summarize commands may be compute
intensive for large data. For example, Heartbeat | where Computer startsWith
“Production” | summarize count() by ComputerIP is more efficient than
specifying extended columns or using contains operator.
Full XML and Json parsing can also be avoided
to reduce the CPU usage for the operators.
This holds true for all data in fact since the
size of data passed from pipe operator can be quite large.
The unnecessary use of search and union
operators must be avoided for similar reasons and filtering can be applied
early in the queries to reduce the data size.
Reference: Sample queries
No comments:
Post a Comment