Sunday, January 5, 2025

 

Setting up a Databricks instance to allow users to run their notebooks, jobs, and Delta Live Tables (DLT) queries on serverless compute involves several steps and considerations. Here's an overview of the process and how it differs from all-purpose compute clusters:

Enabling Serverless Compute

To set up serverless compute:

  1. An account admin must enable the feature in the account console:
    • Navigate to Settings > Feature enablement
    • Enable "Serverless compute for workflows, notebooks, and Delta Live Tables"
  1. Ensure your Databricks workspace meets the requirements:
    • Unity Catalog must be enabled
    • The workspace must be in a supported region

Types of Serverless Compute

Databricks offers several types of serverless compute:

  • Serverless compute for notebooks
  • Serverless compute for jobs
  • Serverless SQL warehouses
  • Serverless DLT pipelines
  • Mosaic AI Model Serving
  • Mosaic AI Model Training for forecasting

Benefits of Serverless Compute

Serverless compute offers several advantages:

  • Rapid startup and scaling times
  • Automatic resource allocation and management
  • Pay only for compute used
  • Reduced management overhead
  • Automatic security patching and upgrades

Differences from All-Purpose Compute Clusters

Serverless compute differs from all-purpose clusters in several ways:

  1. Resource Management: Serverless compute is managed by Databricks, while all-purpose clusters require manual configuration and management
  2. Scaling: Serverless includes a smarter, more responsive autoscaler compared to classic compute
  3. Version Updates: Databricks automatically and safely upgrades serverless compute to the latest versions
  4. Network Isolation: Serverless compute runs within a network boundary for the workspace, with additional security layers
  5. Compute Plane: Serverless runs in a compute layer within the Databricks account, while classic compute runs in the customer's cloud account
  6. Access Control: All workspace users can use serverless compute without needing cluster creation permissions

 

Security Considerations

When setting up serverless compute:

  • Be aware that serverless compute for notebooks and jobs has unrestricted internet access by default
  • Consider configuring network security features for more control

Understand that serverless workloads are executed within multiple layers of isolation for data protection

Usage and Optimization

To optimize serverless compute usage:

  • Leverage the automatic infrastructure optimization provided by Databricks
  • Monitor performance using built-in tools in the Azure Portal
  • Take advantage of the promotional discounts currently offered (50% for Workflows and DLT, 30% for Notebooks)

By setting up serverless compute, you can provide users with a more streamlined experience for running notebooks, jobs, and DLT queries, while reducing management overhead and potentially lowering costs compared to traditional all-purpose compute clusters.

Reference: previous articles

No comments:

Post a Comment