Cluster computing

Sunday, January 5, 2025

Setting up a Databricks instance to allow users to run their notebooks, jobs, and Delta Live Tables (DLT) queries on serverless compute involves several steps and considerations. Here's an overview of the process and how it differs from all-purpose compute clusters:

Enabling Serverless Compute

To set up serverless compute:

An account admin must enable the feature in the account console:

Navigate to Settings > Feature enablement
Enable "Serverless compute for workflows, notebooks, and Delta Live Tables"

Ensure your Databricks workspace meets the requirements:

Unity Catalog must be enabled
The workspace must be in a supported region

Types of Serverless Compute

Databricks offers several types of serverless compute:

Serverless compute for notebooks
Serverless compute for jobs
Serverless SQL warehouses
Serverless DLT pipelines
Mosaic AI Model Serving
Mosaic AI Model Training for forecasting

Benefits of Serverless Compute

Serverless compute offers several advantages:

Rapid startup and scaling times
Automatic resource allocation and management
Pay only for compute used
Reduced management overhead
Automatic security patching and upgrades

Differences from All-Purpose Compute Clusters

Serverless compute differs from all-purpose clusters in several ways:

Resource Management: Serverless compute is managed by Databricks, while all-purpose clusters require manual configuration and management
Scaling: Serverless includes a smarter, more responsive autoscaler compared to classic compute
Version Updates: Databricks automatically and safely upgrades serverless compute to the latest versions
Network Isolation: Serverless compute runs within a network boundary for the workspace, with additional security layers
Compute Plane: Serverless runs in a compute layer within the Databricks account, while classic compute runs in the customer's cloud account
Access Control: All workspace users can use serverless compute without needing cluster creation permissions

Security Considerations

When setting up serverless compute:

Be aware that serverless compute for notebooks and jobs has unrestricted internet access by default
Consider configuring network security features for more control

Understand that serverless workloads are executed within multiple layers of isolation for data protection

Usage and Optimization

To optimize serverless compute usage:

Leverage the automatic infrastructure optimization provided by Databricks
Monitor performance using built-in tools in the Azure Portal
Take advantage of the promotional discounts currently offered (50% for Workflows and DLT, 30% for Notebooks)

By setting up serverless compute, you can provide users with a more streamlined experience for running notebooks, jobs, and DLT queries, while reducing management overhead and potentially lowering costs compared to traditional all-purpose compute clusters.

Reference: previous articles

Cluster computing

Sunday, January 5, 2025

No comments:

Post a Comment