Overwatch implementation involves cluster configuration,
overwatch configuration and run. The steps outlined in this article are a guide
to realizing cluster utilization reports from Azure Databricks instance. It
starts with some concepts as an overview and context in which the steps are
performed, followed by the listing of the steps and closing with the running
and viewing of the Overwatch jobs and reports. This is a continuation of the
previous article
also on similar topics.
Overwatch can be taken as an analytics project over
Databricks. It collects data from multiple data sources such as APIs and
cluster logs, enriches and aggregates the data and comes with little or no
cost. The audit logs and cluster logs are primary data sources but the cluster
logs are crucial to get the cluster utilization data. It requires dedicated
storage account for these and the time-to-live must be enabled so that the
retention does not grow to incur unnecessary costs. The cluster logs must not
be stored on the DBFS directly but can reside on an external store. When there
are different Databricks workspaces numbered say 1 to N, each workspace pushes
the diagnostic data to the EventHubs and writes the cluster logs to the per
region dedicated storage account. One of the Databricks workspace is chosen to
deploy Overwatch. The Overwatch jobs read the storage account and the event hub
diagnostic data to create bronze, silver and gold data pipelines which can be
read from anywhere for the reports.
The steps involve with overwatch configurations include the
following: 1. Create a storage account 2. Create an Azure Event Hub namespace
3. Store the Event Hub connection string in a KeyVault 4. Enable Diagnostic
settings in the Databricks instance for the event hub 5. Store the Databricks
PAT token in the KeyVault, 6. Create a secret scope 7. Use the Databricks
overwatch notebook from link
and replace the parameters 8. Configure the storage account within the
workspace. 9. Create the cluster and add the Maven libraries to the cluster
com.databricks.labs:overwatch and com.microsoft.azure:azure-eventhubs-spark and
run the Overwatch notebook.
There are a few elaborations to the above steps that can be
called out otherwise the steps are routine. All the Azure resources can be
created with default settings. The connection string for the EventHub is stored
in the KeyVault as a secret. The personal access token aka PAT token created
from the Databricks is also stored in the KeyVault as a secret. The PAT Token
is created from the user settings of the Azure Databricks instance. A scope is
created to import the token back from the KeyVault to the Databricks. A cluster
is created to run the Databricks job. The two maven libraries are added to the
databricks clusters’ library. The logging tab of the advanced options in the
cluster’s configuration will allow us to specify a dbfs location pertaining to
the external storage account we created to store the cluster logs. The Azure
navigation bar for the Azure Databricks instance will allow for the diagnostic
settings data to be sent to EventHub.
The Notebook to describe the Databricks jobs for the
Overwatch takes the above configuration as parameters including those for the
dbfs location for the cluster logs target, the Extract-Transform-Load database
name which stores the tables used for the dashboard, the consumer database
name, the secret scope, the secret key for the PAT token, the secret key for
the EventHub, the topic name in the EventHub, the primordial date to start the
Overwatch, the maximum number of days to bound the data and the scopes.
Overwatch provides both the summary as well as the
drill-down options to understand the operations of a Databricks instance. It
has two primary modes: Historical and Real-time. It coalesces all the logs
produced by Sparks and Databricks via a periodic job run and then enriches this
data through various API calls. The jobs from the notebook creates the
configuration string with OverwatchParams. Most functionalities can be realized
by instantiating the workspace object with these OverwatchParams. It provides
two tables the dbuCostDetails table and the instanceDetails table which can
then be used for reports.
No comments:
Post a Comment