Monday, October 23, 2023

 This is a continuation of previous articles on Azure Databricks instances. Azure Databricks is a fast and powerful Apache Spark based analytics service that makes it easy to rapidly develop and deploy Big Data analytics and artificial intelligence solutions. One can author models in Azure Databricks and register them with mlflow during experimentation for prediction, drift detection and outlier detection purposes. Databricks monitoring is as complicated as it is important, partly because there are so many destinations for charts and dashboards and some with overlaps. The Azure monitoring and associated dashboard provides Job Latency, Stage Latency, Task Latency,  Sum task execution per host, task metrics, cluster throughput, Streaming throughput/latency and resource consumption per executor, executor compute time metrics and shuffle metrics.   

Azure’s cost management dashboard for databricks is also helpful to track costs on workspaces and cluster basis which provides a helpful set of utilities for administrative purposes. Tags can be added to workspaces as well as clusters. 

Apache Spark UI on Databricks instance also provides information on what’s happening in your applicationThe Spark UI provides information on input rate for streaming events, processing time, completed batches, batch details, job details, task details and so on. The driver log is helpful for exceptions and print statements while the executor log is helpful to detect errant and runaway tasks. 

But the most comprehensive set of charts and graphs are provided by Overwatch. 

Overwatch can be considered an analytics project over Databricks. It collects data from multiple data sources such as APIs and cluster logs, enriches and aggregates the data and comes with little or no cost.  

Overwatch provides categorized Dashboard pages for Workspace, clusters, job and notebooks. Metrics available on the workspace dashboard include daily cluster spend chart, cluster spend on each workspace, DBU cost vs compute cost, cluster spend by type on each workspace, cluster count by type on each workspace, and count of scheduled jobs on each workspace. Metrics available on the cluster dashboard include DBU spend by cluster category, DBU spend by the most expensive cluster per day per workspace, top spending clusters per day, DBU spend by the top 3 expensive clusters, cluster count in each category, cluster node-type breakdown, cluster node-type breakdown by the potential, and cluster node potential breakdown by cluster category. Metrics available on the Job dashboard include daily cost on jobs, job count by workspace, and jobs running in interactive clusters. Metrics available on Notebook dashboard include data throughput per notebook path, longest running notebooks, data throughput per notebook path, top notebooks returning a lot of data to the UI, spark actions count, notebooks with the largest records, task count by task type, large tasks count,  and count of jobs executing on notebooks. 

While these can be an overwhelming number of charts from various sources,  a curated list of charts must show 1. Databricks workload types in terms of job compute for data engineers, jobs light compute for data analysis, and all-purpose compute. 2. Consumption based charts in terms of DBU, virtual machine, public ip address, blob storage, managed disk, bandwidth etc., 3. Pricing plans in terms of pay-as-you-go, reservations of DBU, DBCU for 1-3 years and/or on region/duration basis and 4. Tags based in terms of cluster tags, pool tags, and workspace tags. Tags can propagate with clusters created from pools and clusters not created from pools. And 5. Cost calculation in terms of quantity, effective price and effective cost. 

Some of this information can come form database schema for overwatch with custom queries such as: 
select sku, isActive, any_value(contract_price) * count(*) as cost from overwatch.`DBUcostdetails` 
group by sku, isActive  
having isActive = true; 

These are only some of the ways in which dashboard charts are available. 

 

No comments:

Post a Comment