Friday, September 22, 2023

 

This is a continuation of the previous articles on Azure Databricks and Overwatch analysis. This section focuses on the role-based access control required for the setup and deployment of Overwatch.

The use of a storage account as a working directory for Overwatch implies that it will need to be accessed from the databricks workspace. There are two ways to do this – one that involves the azure active directory credentials passthrough with ‘abfss@container.storageaccount.dfs.core.windows.net’ name resolution and another that mounts the remote storage account as a folder on the local file system.

The former requires that the cluster be enabled for active directory credentials passthrough and will work for directly resolving the deployment and reports folder but for contents whose layout is dynamically determined, the resolution is expensive each time. The abfss scheme also fails with error 403 when there are tokens demanded for certain activities. Instead, the second way of mounting helps with one time setup. The mount is setup with the help of a service principal and getting OAuth tokens from the active directory. It becomes the prefix for all the temporary files and folders.

Using the credentials with the Azure Active Directory only works when there are corresponding role assignments and container/blob access control lists. The role assignment for the control plane differs from that of the data plane so there are roles for both. This separation of roles allows access to certain containers and blobs without necessarily allowing access to change the storage account and container organization or management. With acls applied to individual files/blobs and folders/container, the authentication-authorization-auditing is completely covered and scoped at the finest granularity.

Then queries like the following can come very helpful:

1.       Frequent operations can be queried with: 

StorageBlobLogs 

| where TimeGenerated > ago(3d) 

| summarize count() by OperationName 

| sort by count_ desc 

| render piechart  

2.       High latency operations can be queried with: 

StorageBlobLogs 

| where TimeGenerated > ago(3d) 

| top 10 by DurationMs desc 

| project TimeGenerated, OperationName, DurationMs, ServerLatencyMs, ClientLatencyMs = DurationMs – ServerLatencyMs 

3.       Operations causing the most error are caused by: 

  StorageBlobLogs 

| where TimeGenerated > ago(3d) and StatusText !contains "Success" 

| summarize count() by OperationName 

| top 10 by count_ desc 

4.       Gives the number of read transactions and the number of bytes read on each container:

StorageBlobLogs

| where OperationName  == "GetBlob"

| extend ContainerName = split(parse_url(Uri).Path, "/")[1]

| summarize ReadSize = sum(ResponseBodySize), ReadCount = count() by tostring(ContainerName)

No comments:

Post a Comment