Friday, June 30, 2023

 

How to enable Unity Catalog for Azure Databricks?

Azure Databricks is an Azure managed service for provisioning Databricks instances which is a platform that unifies data, analytics and AI. Databricks users who have previously used older versions of Databricks may not have migrated to Unity Catalog which is a centralized administrative module. This article explains how to enable and work with Unity Catalog.

Databricks does not force us to migrate our data into proprietary storage systems to use the platform. Instead, it allows us to integrate the platform with external storage and deploys compute to process the data. We control the integrations and manage permissions. Unity Catalog further extends this relationship by managing permissions for accessing data using SQL syntax from within Azure Databricks.

The primary purpose is integrated access control:

·       Unity Catalog provides centralized access control, auditing, lineage, and data discovery capabilities across Azure Databricks workspaces. It offers a single place to administer data access policies that apply across all workspaces and personas. It automatically captures user-level audit logs that record access to your data. Unity Catalog also captures lineage data that tracks how data assets are created and used across all languages and personas.

 

·       An Azure managed identity can access external storage on behalf of Unity Catalog users. Managed identities provide an identity for applications to use when they connect to resources that support Azure Active Directory (Azure AD) authentication.

 

The Unity Catalog comprises a hierarchy of Metastore at the top level, followed by Catalog, then by Schema and Tables and views at the leaf level. All items are referenced via a three-level namespace in the format catalog.schema.table. Metastore is the top level container for metadata. Other than the metastore, Unity Catalog comprises of user management module.

The steps to follow to setup Unity Catalog are:

1.       Configure a storage container and Azure managed identity with read-write access to it.

2.       Create a metastore

3.       Attach workspaces to the metastore

4.       Add users, groups and service principals to the Azure Databricks account.

Many people struggle to follow these steps because the navigation to get started is hidden behind their user icon on the admin accounts portal under the menu item “Manage Account”. Once they find this item, it is easy to follow the get started tutorial to create a metastore and setup the unity catalog as directed.

The steps to follow for setting up an integration of a fresh new instance with Azure data lake storage are:

1.       Create an Azure Databricks instance in a vnet.

2.       Create an ADB access connector resource for ADLS.

3.       Use the access connector MI to access the Unity Catalog root storage account by specifying the access connector id under Data->Metastore.

4.       Create a storage credential in the Unity catalog for this Managed Identity

5.       Set up your data lake storage account with storage firewall that allows only Optum Ips

6.       Grant access to this storage account by specifying to allow access from specific resource type and the databricks instance.

7.       Setup storage Credential with external location mapping and access control policies for users and groups in the Unity Catalog.

 

 Problem Statement: A 0-indexed integer array nums is given.

Swaps of adjacent elements are able to be performed on nums.

A valid array meets the following conditions:

·        The largest element (any of the largest elements if there are multiple) is at the rightmost position in the array.

·        The smallest element (any of the smallest elements if there are multiple) is at the leftmost position in the array.

Return the minimum swaps required to make nums a valid array.

 

Example 1:

Input: nums = [3,4,5,5,3,1]

Output: 6

Explanation: Perform the following swaps:

- Swap 1: Swap the 3rd and 4th elements, nums is then [3,4,5,3,5,1].

- Swap 2: Swap the 4th and 5th elements, nums is then [3,4,5,3,1,5].

- Swap 3: Swap the 3rd and 4th elements, nums is then [3,4,5,1,3,5].

- Swap 4: Swap the 2nd and 3rd elements, nums is then [3,4,1,5,3,5].

- Swap 5: Swap the 1st and 2nd elements, nums is then [3,1,4,5,3,5].

- Swap 6: Swap the 0th and 1st elements, nums is then [1,3,4,5,3,5].

It can be shown that 6 swaps is the minimum swaps required to make a valid array.

Example 2:

Input: nums = [9]

Output: 0

Explanation: The array is already valid, so we return 0.

 

Constraints:

·         1 <= nums.length <= 105

·         1 <= nums[i] <= 105

Solution:

class Solution {

    public int minimumSwaps(int[] nums) {

        int min = Arrays.stream(nums).min().getAsInt();

        int max = Arrays.stream(nums).max().getAsInt();

        int count = 0;

        while (nums[0] != min && nums[nums.length-1] != max && count < 2 * nums.length) {           

            var numsList = Arrays.stream(nums).boxed().collect(Collectors.toList());

            var end = numsList.lastIndexOf(max);

            for (int i = end; i < nums.length-1; i++) {

                swap(nums, i, i+1);

                count++;

            }

 

            numsList = Arrays.stream(nums).boxed().collect(Collectors.toList());

            var start = numsList.indexOf(min);

            for (int j = start; j >= 1; j--) {

                swap(nums, j, j-1);

                count++;

            }

        }

 

        return count;

    }

 

    public void swap (int[] nums, int i, int j) {

        int temp = nums[j];

        nums[j] = nums[i];

        nums[i] = temp;

    }

}

 

Input

nums =

[3,4,5,5,3,1]

Output

6

Expected

6

 

Input

nums =

[9]

Output

0

Expected

0

 


No comments:

Post a Comment