Azure Databricks aka dbx and Public Egress IP:
The secure way to deploy Azure Databricks workspace is with NPIP
and VNet Injection set to true. This is also the most flexible way of
deploying. When azure resources receive traffic from Dbx clusters, they must
allow that traffic by the ip address or CIDR range and it becomes difficult to
find out the egress ip for Dbx clusters. This article explains how to configure
the egress IP and the IP access restriction rules on traffic originating from
the databricks workspace and destined to these azure resources with access
restrictions.
Egress for deployment without secure cluster connectivity
aka SCC /NPIP and VNet injection, is different from that with SCC and VNet
Injection. Without SCC/NPIP, there is a control plane NAT IP and with SCC there
is a SCC Relay IP. The relay refers to
the tunneling of traffic through a relay in the control plane for deployments
with no public ip aka NPIP and with public and private subnets. With the SCC
enabled, the cluster initiates a connection to the SCC relay during cluster
creation over the 443 port and uses a different application than is used for
the Web application and REST API. Cluster administration tasks reach the
cluster through this tunnel.
With SCC enabled, both the workspace subnets are effectively
private subnets since cluster nodes do not have public IP addresses. The
network egress will vary depending on whether the Dbx workspace is deployed in
the default managed VNet is used or your own virtual network aka vNet Injection.
In the managed network, a default NAT gateway is automatically created within
the resource group associated with the databricks workspace. If we use secure
cluster connectivity with vNet Injection, then we must ensure that it has a
stable egress public IP using one of the following options:
-
Choose an egress load balancer aka outbound load
balancer by providing loadBalancerName, loadBalancerBackendPoolName and
loadBalancerFrontendConfigName and loadBalancerPublicIPName to the workspace
parameters. The load balancer configuration is not customizable and is tightly
controlled by the Dbx workspace
-
Choose a NAT gateway and configure the gateway
on both workspace’s subnets. Clusters have a stable egress public IP, and this
can be done via the portal and IaC.
-
Choose an egress firewall if there is complex
routing involved. These user-defined routes aka UDRs ensure that network
traffic is routed correctly for the workspace and either directly to the
required endpoints or through an egress firewall. Allowed firewall rules will
then need to be specified.
So the simplest approach for Vnet injection cases is to use
a NAT gateway and add the public ip address of the NAT gateway to the access
restriction rules of target Azure resources that must be accessed from the jobs
and notebooks within the workspace.
References:
-
https://learn.microsoft.com/en-us/azure/databricks/security/network/secure-cluster-connectivity
-
https://learn.microsoft.com/en-us/azure/databricks/security/network/secure-cluster-connectivity#egress-with-vnet-injection
-
https://learn.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/udr
No comments:
Post a Comment