This is a continuation of previous articles on Overwatch which can be considered an analytics project over Databricks. It collects data from multiple data sources such as APIs and cluster logs, enriches and aggregates the data and comes with little or no cost. This section of the article describes some of the considerations when deploying Overwatch that might not be obvious from the public documentation but helps with optimizing the deployments.
Overwatch deployments must include an EventHub as well as a
storage account. The EventHub receives diagnostics data and comes from the
target Databricks workspaces. Usually, only EventHub namespace is required to
work with the Overwatch deployment, but it will have 1 to N Event Hubs within
that namespace with one each for every workspace monitored. When the EventHubs
and their namespace is created, the workspaces must be associated with it which
does not alter a workspace if it is already existing. The association reflects
on the workspace in the diagnostics settings under the monitoring section of
that instance.
Unlike the EventHub that receives the diagnostic data, a
storage account is required as a working directory for the Overwatch instance
so that it may write out its reports from the calculations it makes. These
reports could be in binary format but the aggregated information on dbu-cost
basis as well as instance-level basis are available to view in two independent
tables in the Overwatch database on the workspace where it deployed. There are other artifacts also stored on this
storage account such as the parameters for the deployment of Overwatch and
incremental computations, but the entire
account can be dedicated to Overwatch as a working directory. It is for this
reason that the storage account is dedicated to Overwatch that the compute logs
from the workspaces are also archived here
because the locality of the data enables the Overwatch jobs to read the
logs with minimum cost.
This is another diagnostic setting for a workspace, and it
might be additional in the case that the logs from the workspace were already
being sent elsewhere either via EventHub or via a different storage account.
The separation of the logs read by Overwatch from that for other purposes helps
Overwatch be performant as well as reliable by maintaining isolation. The
compute logs are only read by Overwatch and so they need not be saved longer
than necessary and intended only for the computations of Overwatch.
Both the event hub and the storage account can be regional
because cross region transfer of data can be expensive and the ability to
decide what data is sent to the Overwatch and making it local reduces the cost
significantly. Instead of thinking about eliminating storage costs, it is
better to exercise over what and how much data is sent to Overwatch to perform
its calculations. Having multiple diagnostic settings on the Databricks
workspace helps with this.
Lastly, it must be noted that the cluster logs can be
considered different from the compute logs in that one is emitted by the
clusters spun up by the users on a Databricks workspace and the other is
written out by the Databricks workspace itself. All jobs regardless of whether
they are user jobs or Overwatch jobs access the data over https or via mounts.
The https way of accessing data is with the help of the abfss@container.<storage-account>.dfs.core.windows.net
qualifier or via mounts that can be setup via
configs = {"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": "<application-id>",
"fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>"),
"fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<directory-id>/oauth2/token"}
dbutils.fs.mount(
source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/",
mount_point = "/mnt/<mount-name>",
extra_configs = configs)
When the cluster is created, the logging destination must be
set to this mount and will be found under the advanced configuration section.
This summarizes the capture and analysis by Overwatch
deployments.
Reference: https://1drv.ms/w/s!Ashlm-Nw-wnWhM9v1fn0cHGer-BjQg?e=ke3d50
No comments:
Post a Comment