Overwatch reporting
involves cluster configuration, overwatch configuration and jobs run. The steps
outlined in this article are a guide to realizing cluster utilization reports
from Azure Databricks instance. It starts with some concepts as an overview and
context in which the steps are performed, followed by the listing of the steps,
and closing with the running and viewing of the Overwatch jobs and
reports.
Overwatch can be taken
as an analytics project over Databricks. It collects data from multiple data
sources such as APIs and cluster logs, enriches and aggregates the data and
comes with little or no cost. The audit logs and cluster logs are primary data
sources, but the cluster logs are crucial to get the cluster utilization data.
It requires dedicated storage account for these and the time-to-live must be
enabled so that the retention does not grow to incur unnecessary costs. The
cluster logs must not be stored on the DBFS directly but can reside on an
external store. When there are different Databricks workspaces numbered say 1
to N, each workspace pushes the diagnostic data to the EventHubs and writes the
cluster logs to the per region dedicated storage account. One of the Databricks
workspace is chosen to deploy Overwatch. The Overwatch jobs read the storage
account and the event hub diagnostic data to create bronze, silver and gold
data pipelines which can be read from anywhere for the reports.
The steps involve with
overwatch configurations include the following:
1. Create a storage
account
2. Create an Azure Event
Hub namespace
3. Store the Event Hub
connection string in a KeyVault
4. Enable Diagnostic
settings in the Databricks instance for the event hub
5. Store the Databricks
PAT token in the KeyVault,
6. Create a secret
scope
7. Use the Databricks
overwatch notebook from [link](https://databrickslabs.github.io/overwatch/deployoverwatch/runningoverwatch/notebook/) and
replace the parameters
8. Configure the storage
account within the workspace.
9. Create the cluster
and add the Maven libraries to the cluster com.databricks.labs:overwatch and
com.microsoft.azure:azure-eventhubs-spark and run the Overwatch notebook.
There are a few
elaborations to the above steps that can be called out otherwise the steps are
routine. All the Azure resources can be created with default settings. The
connection string for the EventHub is stored in the KeyVault as a secret. The
personal access token aka PAT token created from the Databricks is also stored
in the KeyVault as a secret. The PAT Token is created from the user settings of
the Azure Databricks instance. A scope is created to import the token back from
the KeyVault to the Databricks. A cluster is created to run the Databricks job.
The two maven libraries are added to the databricks clusters’ library. The
logging tab of the advanced options in the cluster’s configuration will allow
us to specify a dbfs location pertaining to the external storage account we
created to store the cluster logs. The Azure navigation bar for the Azure
Databricks instance will allow for the diagnostic settings data to be sent to
EventHub.
The Notebook to describe
the Databricks jobs for the Overwatch takes the above configuration as
parameters including those for the dbfs location for the cluster logs target,
the Extract-Transform-Load database name which stores the tables used for the
dashboard, the consumer database name, the secret scope, the secret key for the
PAT token, the secret key for the EventHub, the topic name in the EventHub, the
primordial date to start the Overwatch, the maximum number of days to bound the
data and the scopes.
Overwatch provides both
the summary as well as the drill-down options to understand the operations of a
Databricks instance. It has two primary modes: Historical and Real-time. It
coalesces all the logs produced by Sparks and Databricks via a periodic job run
and then enriches this data through various API calls. The jobs from the
notebook creates the configuration string with OverwatchParams. Most
functionalities can be realized by instantiating the workspace object with
these OverwatchParams. It provides two tables the dbuCostDetails table and the
instanceDetails table which can then be used for reports.
Result:
Overwatch creates two
tables as shown:
These tables can be read
from CLI, SDK, and a variety of clients.
Reference:
demonstration with deployable IaC: dbx-overwatch.zip
No comments:
Post a Comment