Wednesday, July 12, 2023

 

Overwatch implementation involves cluster configuration, overwatch configuration and run. The steps outlined in this article are a guide to realizing cluster utilization reports from Azure Databricks instance. It starts with some concepts as an overview and context in which the steps are performed, followed by the listing of the steps and closing with the running and viewing of the Overwatch jobs and reports. This is a continuation of the previous article also on similar topics.

Overwatch can be taken as an analytics project over Databricks. It collects data from multiple data sources such as APIs and cluster logs, enriches and aggregates the data and comes with little or no cost. The audit logs and cluster logs are primary data sources but the cluster logs are crucial to get the cluster utilization data. It requires dedicated storage account for these and the time-to-live must be enabled so that the retention does not grow to incur unnecessary costs. The cluster logs must not be stored on the DBFS directly but can reside on an external store. When there are different Databricks workspaces numbered say 1 to N, each workspace pushes the diagnostic data to the EventHubs and writes the cluster logs to the per region dedicated storage account. One of the Databricks workspace is chosen to deploy Overwatch. The Overwatch jobs read the storage account and the event hub diagnostic data to create bronze, silver and gold data pipelines which can be read from anywhere for the reports.

The steps involve with overwatch configurations include the following: 1. Create a storage account 2. Create an Azure Event Hub namespace 3. Store the Event Hub connection string in a KeyVault 4. Enable Diagnostic settings in the Databricks instance for the event hub 5. Store the Databricks PAT token in the KeyVault, 6. Create a secret scope 7. Use the Databricks overwatch notebook from link and replace the parameters 8. Configure the storage account within the workspace. 9. Create the cluster and add the Maven libraries to the cluster com.databricks.labs:overwatch and com.microsoft.azure:azure-eventhubs-spark and run the Overwatch notebook.

There are a few elaborations to the above steps that can be called out otherwise the steps are routine. All the Azure resources can be created with default settings. The connection string for the EventHub is stored in the KeyVault as a secret. The personal access token aka PAT token created from the Databricks is also stored in the KeyVault as a secret. The PAT Token is created from the user settings of the Azure Databricks instance. A scope is created to import the token back from the KeyVault to the Databricks. A cluster is created to run the Databricks job. The two maven libraries are added to the databricks clusters’ library. The logging tab of the advanced options in the cluster’s configuration will allow us to specify a dbfs location pertaining to the external storage account we created to store the cluster logs. The Azure navigation bar for the Azure Databricks instance will allow for the diagnostic settings data to be sent to EventHub.

The Notebook to describe the Databricks jobs for the Overwatch takes the above configuration as parameters including those for the dbfs location for the cluster logs target, the Extract-Transform-Load database name which stores the tables used for the dashboard, the consumer database name, the secret scope, the secret key for the PAT token, the secret key for the EventHub, the topic name in the EventHub, the primordial date to start the Overwatch, the maximum number of days to bound the data and the scopes.

Overwatch provides both the summary as well as the drill-down options to understand the operations of a Databricks instance. It has two primary modes: Historical and Real-time. It coalesces all the logs produced by Sparks and Databricks via a periodic job run and then enriches this data through various API calls. The jobs from the notebook creates the configuration string with OverwatchParams. Most functionalities can be realized by instantiating the workspace object with these OverwatchParams. It provides two tables the dbuCostDetails table and the instanceDetails table which can then be used for reports.

No comments:

Post a Comment