Cluster computing

Tuesday, November 14, 2023

Machine Learning workspace aka mlworkspace and Public Egress IP:

The secure way to deploy Machine Learning workspace is with NPIP and VNet Injection set to true. This is also the most flexible way of deploying. When azure resources receive traffic from ML workspace compute instances and clusters, they must allow that traffic by the ip address or CIDR range and it becomes difficult to find out the egress ip for these. This article explains how to configure the egress IP and the IP access restriction rules on traffic originating from the ML workspace and destined to these azure resources with access restrictions.

Egress for deployment without secure cluster connectivity aka SCC /NPIP and VNet injection, is different from that with SCC and VNet Injection. The networking section of the workspace itself can be enabled with one of three different IP configurations: 1. allow public access from all networks. 2. allow public access from selected networks and 3. disabled. Between the first and the second, there is a public ip address associated with the workspace but with the third, there is none. When a public ip address is assigned to the workspace, outbound Ip traffic is allowed for all compute resources created within the workspace by virtue of that ip address. Otherwise, those computes will get their public and private ip addresses by virtue of the subnet that they are created in, and these subnets will be available from the VNet injection.

With the npip, both the compute instance and compute clusters are effectively in private subnets since they do not have public IP addresses. The network egress will vary depending on whether the workspace is deployed in the default managed VNet is used or your own virtual network aka vNet Injection. In the managed network, a default NAT gateway is automatically created within the resource group associated with the ml workspace. If we use npip with vNet Injection, then we must ensure that it has a stable egress public IP using one of the following options:

Choose an egress load balancer aka outbound load balancer by providing loadBalancerName, loadBalancerBackendPoolName and loadBalancerFrontendConfigName and loadBalancerPublicIPName to the workspace parameters. The load balancer configuration is not customizable and is tightly controlled by the ml workspace
Choose a NAT gateway and configure the gateway on the subnets used by the compute resources. Compute instance and clusters have a stable egress public IP, and this can be done via the portal and IaC.
Choose an egress firewall if there is complex routing involved. These user-defined routes aka UDRs ensure that network traffic is routed correctly for the workspace and either directly to the required endpoints or through an egress firewall. Allowed firewall rules will then need to be specified.

So the simplest approach for Vnet injection cases is to use a NAT gateway and add the public ip address of the NAT gateway to the access restriction rules of target Azure resources that must be accessed from the jobs and notebooks within the workspace.

References:

Cluster computing

Tuesday, November 14, 2023

No comments:

Post a Comment