Cluster computing

Wednesday, May 14, 2025

The following is a list of errors and resolutions frequently encountered during k8s and airflow setup with Active Directory integration and Single Sign-On on an Azure Kubernetes Service instance. This is hard to find online.

Unable to get aks-credentials with the error message of python import error for azure.graph module even though the cluster and the resource group are correct:

pip3 install azure-graphrbac
pip3 install msgraph-core

Az cli command to specify a kubectl command on the cluster fails:

az extension add --name aks-preview
az extension add --name azure-cli-legacy-auth
az extension add --name resource-graph
az extension add --name k8s-extension

The extensions are there for the az cli command but still they fail:

az extension update --name aks-preview && az extension update --name k8s-extension

Unable to get namespaces on the cluster even after successful login and extensions install:

Run both:

az aks get-credentials --resource-group <resource-group> --name <aks-cluster>
kubelogin convert-kubeconfig -l azurecli

Installation of airflow fails:

Get helm, this is probably going to be the fastest way to do the install and add the url to download the helm chart from airflow or create a HelmRelease

curl https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash
helm repo add apache-airflow https://airflow.apache.org
helm repo update

Create a namespace:

kubectl create namespace airflow

Repo exists and chart found but airflow install times out:

Increase timeout.

helm install dev-release apache-airflow/airflow --namespace airflow --timeout 60m0s --wait

Diagnose failures:

Use the following to the deployment logs or HelmRelease failures:
kubectl describe helmrelease.helm.toolkit.fluxcd.io/airflow -n airflow

For failed instances, uninstall and install again:
helm list --all-namespaces --failed
helm uninstall apache-airflow/airflow --namespace airflow

Webserver is inaccessible:

kubectl port-forward svc/dev-release-webserver 8080:8080 —namespace airflow
#command to reset metadata in airflow after ad integration
airflow db reset

Integration with Active Directory or LDAP does not work:

Modify webserver_config.py with the following:

Sample webserver_config.py for ldap:
import os
from flask_appbuilder.security.manager import AUTH_LDAP

basedir = os.path.abspath(os.path.dirname(__file__))
WTF_CSRF_ENABLED = True
AUTH_TYPE = AUTH_LDAP
AUTH_LDAP_SERVER = 'ldap://your-ldap-server:389'
AUTH_LDAP_BIND_USER = 'cn=svc_airflow,cn=Managed Service Accounts,dc=testdomain,dc=local'
AUTH_LDAP_BIND_PASSWORD = 'supersecretpw!'
AUTH_LDAP_UID_FIELD = 'sAMAccountName'
AUTH_LDAP_SEARCH = 'ou=TestUsers,dc=testdomain,dc=local'
AUTH_ROLES_MAPPING = {
'cn=Access_Airflow,ou=Groups,dc=testdomain,dc=local':["Admin"],
'ou=TestUsers,dc=test,dc=local':["User"]
}
AUTH_ROLE_ADMIN = 'Admin'
AUTH_USER_REGISTRATION = True
AUTH_USER_REGISTRATION_ROLE = 'Admin'
AUTH_ROLES_SYNC_AT_LOGIN = True
AUTH_LDAP_GROUP_FIELD = "memberOf"

Webserver is accessible but api auth fails:

Modify airflow ConfigMap to allow auth api with AD integration:

apiVersion: v1
kind: ConfigMap
metadata:
name: airflow-config
data:
airflow.cfg: |
[api]
auth_backends = airflow.api.auth.backend.basic_auths

Tuesday, May 13, 2025

These are helpful utilities for image processing using Azure resources:

1. Vectorize images: Sample code and output follow:

Output: Vector embedding: [-1.0224609, -1.3076172,...

2. Analyze images Sample code and output follow:

Output:

Image analysis results:

Caption: 'a building with a road and trees', Confidence: 0.5844

{

"modelVersion": "2023-10-01",

"captionResult": {

"text": "a building with a road and trees",

"confidence": 0.5844066143035889

"denseCaptionsResult": {

"values": [

{

"text": "a building with a road and trees",

"confidence": 0.5844066143035889,

"boundingBox": {

"x": 0,

"y": 0,

"w": 1920,

"h": 1080

}

{

"text": "a building with a roof and trees",

"confidence": 0.5829769968986511,

"boundingBox": {

"x": 929,

"y": 171,

"w": 938,

"h": 884

}

{

"text": "a tree shadow on the road",

"confidence": 0.6864767074584961,

"boundingBox": {

"x": 332,

"y": 0,

"w": 255,

"h": 1062

}

{

"text": "a top view of a building",

"confidence": 0.7406209707260132,

"boundingBox": {

"x": 962,

"y": 189,

"w": 887,

"h": 332

}

{

"text": "a blurry image of a person's arm",

"confidence": 0.7104462385177612,

"boundingBox": {

"x": 1634,

"y": 328,

"w": 54,

"h": 63

}

{

"text": "a building with a roof and a road and trees",

"confidence": 0.5697128176689148,

"boundingBox": {

"x": 0,

"y": 0,

"w": 1890,

"h": 1056

}

{

"text": "a tree in a park",

"confidence": 0.6157793402671814,

"boundingBox": {

"x": 848,

"y": 444,

"w": 503,

"h": 619

}

{

"text": "a close up of a plant",

"confidence": 0.6476104855537415,

"boundingBox": {

"x": 943,

"y": 930,

"w": 206,

"h": 146

}

{

"text": "a tree and grass field",

"confidence": 0.5954487919807434,

"boundingBox": {

"x": 4,

"y": 0,

"w": 319,

"h": 1070

}

{

"text": "a close up of a window",

"confidence": 0.7861047387123108,

"boundingBox": {

"x": 1633,

"y": 419,

"w": 83,

"h": 76

}

]

"metadata": {

"width": 1920,

"height": 1080

"tagsResult": {

"values": [

{

"name": "outdoor",

"confidence": 0.9880061149597168

{

"name": "building",

"confidence": 0.93121337890625

{

"name": "urban design",

"confidence": 0.9306544065475464

{

"name": "map",

"confidence": 0.9177150726318359

{

"name": "aerial photography",

"confidence": 0.8905916213989258

{

"name": "intersection",

"confidence": 0.8808201551437378

{

"name": "junction",

"confidence": 0.8713006973266602

{

"name": "aerial",

"confidence": 0.8662087917327881

{

"name": "tree",

"confidence": 0.8520137667655945

{

"name": "infrastructure",

"confidence": 0.8460453748703003

{

"name": "house",

"confidence": 0.8455849885940552

{

"name": "suburb",

"confidence": 0.8436774015426636

{

"name": "transport corridor",

"confidence": 0.841437578201294

{

"name": "street",

"confidence": 0.7220888137817383

}

]

"objectsResult": {

"values": [

{

"boundingBox": {

"x": 961,

"y": 18,

"w": 941,

"h": 1055

"tags": [

{

"name": "building",

"confidence": 0.551

}

]

}

]

"readResult": {

"blocks": []

"smartCropsResult": {

"values": [

{

"aspectRatio": 1.96,

"boundingBox": {

"x": 80,

"y": 135,

"w": 1760,

"h": 900

}

]

"peopleResult": {

"values": [

{

"boundingBox": {

"x": 1033,

"y": 0,

"w": 54,

"h": 78

"confidence": 0.11555740237236023

{

"boundingBox": {

"x": 1706,

"y": 0,

"w": 38,

"h": 28

"confidence": 0.044786710292100906

{

"boundingBox": {

"x": 1764,

"y": 702,

"w": 72,

"h": 107

"confidence": 0.018947092816233635

{

"boundingBox": {

"x": 1617,

"y": 4,

"w": 26,

"h": 32

"confidence": 0.01635269820690155

{

"boundingBox": {

"x": 1897,

"y": 997,

"w": 20,

"h": 80

"confidence": 0.014565806835889816

{

"boundingBox": {

"x": 1174,

"y": 264,

"w": 65,

"h": 138

"confidence": 0.009904739446938038

{

"boundingBox": {

"x": 1570,

"y": 0,

"w": 19,

"h": 26

"confidence": 0.00963284820318222

{

"boundingBox": {

"x": 975,

"y": 812,

"w": 23,

"h": 56

"confidence": 0.007403235416859388

{

"boundingBox": {

"x": 1892,

"y": 256,

"w": 25,

"h": 89

"confidence": 0.0058165849186480045

{

"boundingBox": {

"x": 1730,

"y": 1006,

"w": 92,

"h": 71

"confidence": 0.005636707879602909

{

"boundingBox": {

"x": 1003,

"y": 0,

"w": 49,

"h": 28

"confidence": 0.005567244254052639

{

"boundingBox": {

"x": 1006,

"y": 0,

"w": 64,

"h": 60

"confidence": 0.00508015975356102

{

"boundingBox": {

"x": 1788,

"y": 672,

"w": 72,

"h": 102

"confidence": 0.004823194816708565

{

"boundingBox": {

"x": 1878,

"y": 943,

"w": 39,

"h": 134

"confidence": 0.00384620507247746

{

"boundingBox": {

"x": 1063,

"y": 249,

"w": 49,

"h": 126

"confidence": 0.003768299473449588

{

"boundingBox": {

"x": 1791,

"y": 991,

"w": 115,

"h": 86

"confidence": 0.003688311204314232

{

"boundingBox": {

"x": 1743,

"y": 438,

"w": 45,

"h": 77

"confidence": 0.0035305204801261425

{

"boundingBox": {

"x": 1702,

"y": 0,

"w": 42,

"h": 69

"confidence": 0.0028348765335977077

{

"boundingBox": {

"x": 902,

"y": 805,

"w": 31,

"h": 63

"confidence": 0.0027336280327290297

{

"boundingBox": {

"x": 1135,

"y": 223,

"w": 36,

"h": 65

"confidence": 0.002365714870393276

{

"boundingBox": {

"x": 1068,

"y": 203,

"w": 76,

"h": 164

"confidence": 0.00231865793466568

{

"boundingBox": {

"x": 1721,

"y": 316,

"w": 33,

"h": 72

"confidence": 0.001977135194465518

{

"boundingBox": {

"x": 1430,

"y": 274,

"w": 34,

"h": 63

"confidence": 0.0019341635052114725

{

"boundingBox": {

"x": 917,

"y": 799,

"w": 21,

"h": 32

"confidence": 0.0017207009950652719

{

"boundingBox": {

"x": 1722,

"y": 976,

"w": 58,

"h": 101

"confidence": 0.0017095959046855569

{

"boundingBox": {

"x": 1824,

"y": 989,

"w": 50,

"h": 76

"confidence": 0.0014758453471586108

{

"boundingBox": {

"x": 1745,

"y": 130,

"w": 87,

"h": 202

"confidence": 0.001272828783839941

{

"boundingBox": {

"x": 1559,

"y": 635,

"w": 115,

"h": 232

"confidence": 0.001130886492319405

{

"boundingBox": {

"x": 1220,

"y": 255,

"w": 21,

"h": 55

"confidence": 0.0010053350124508142

}

]

}

Monday, May 12, 2025

This is a sample to illustrate geolocation verification in aerial images:

import cv2

import numpy as np

import requests

# Function to detect and extract features from the aerial image

def extract_features(image_path):

image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)

orb = cv2.ORB_create()

keypoints, descriptors = orb.detectAndCompute(image, None)

return keypoints, descriptors, image

# Function to match features between images

def match_features(descriptors1, descriptors2):

matcher = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)

matches = matcher.match(descriptors1, descriptors2)

matches = sorted(matches, key=lambda x: x.distance) # Sort by match quality

return matches

# Function to get GPS coordinates using Google Maps API

def get_geolocation(image_name, api_key):

url = f"https://maps.googleapis.com/maps/api/geocode/json?address={image_name}&key={api_key}"

response = requests.get(url)

data = response.json()

if data["status"] == "OK":

location = data["results"][0]["geometry"]["location"]

return location["lat"], location["lng"]

return None

# Paths to images

aerial_image_path = "aerial_landmark.jpg"

reference_image_path = "reference_satellite.jpg"

# Extract features from both images

keypoints1, descriptors1, image1 = extract_features(aerial_image_path)

keypoints2, descriptors2, image2 = extract_features(reference_image_path)

# Match features

matches = match_features(descriptors1, descriptors2)

# Draw matches

output_image = cv2.drawMatches(image1, keypoints1, image2, keypoints2, matches[:50], None, flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS)

# Display results

cv2.imshow("Feature Matching", output_image)

cv2.waitKey(0)

cv2.destroyAllWindows()

# Perform geolocation verification

api_key = "YOUR_GOOGLE_MAPS_API_KEY" # Replace with your API key

location = get_geolocation("Hoover Tower, Stanford University", api_key)

if location:

print(f"Verified Landmark Coordinates: Latitude {location[0]}, Longitude {location[1]}")

else:

print("Geolocation verification failed!")

Sunday, May 11, 2025

The following is a sample of how to index images in Azure AI Search for lexical and vector search.

#! /usr/bin/python

#from azure.ai.vision import VisionClient

from azure.core.credentials import AzureKeyCredential

from azure.core.rest import HttpRequest, HttpResponse

from azure.core.exceptions import HttpResponseError

from azure.identity import DefaultAzureCredential

from azure.search.documents import SearchClient

from azure.ai.vision.imageanalysis import ImageAnalysisClient

from azure.ai.vision.imageanalysis.models import VisualFeatures

from tenacity import retry, stop_after_attempt, wait_fixed

from dotenv import load_dotenv

import json

import requests

import http.client, urllib.parse

import os

load_dotenv()

search_endpoint = os.getenv("AZURE_SEARCH_SERVICE_ENDPOINT")

index_name = os.getenv("AZURE_SEARCH_INDEX_NAME")

search_api_version = os.getenv("AZURE_SEARCH_API_VERSION")

search_api_key = os.getenv("AZURE_SEARCH_ADMIN_KEY")

vision_api_key = os.getenv("AZURE_AI_VISION_API_KEY")

vision_api_version = os.getenv("AZURE_AI_VISION_API_VERSION")

vision_region = os.getenv("AZURE_AI_VISION_REGION")

vision_endpoint = os.getenv("AZURE_AI_VISION_ENDPOINT")

credential = DefaultAzureCredential()

#search_credential = AzureKeyCredential(search_api_key)

vision_credential = AzureKeyCredential(vision_api_key)

# Initialize Azure clients

#vision_client = VisionClient(endpoint=vision_endpoint, credential=AzureKeyCredential(vision_api_key))

search_client = SearchClient(endpoint=search_endpoint, index_name=index_name, credential=credential)

analysis_client = ImageAnalysisClient(vision_endpoint, vision_credential)

# Define SAS URL template

sas_template = "https://saravinoteblogs.blob.core.windows.net/playground/vision/main/main/{file}.jpg?sp=rle&st=2025-05-11T00:36:41Z&se=2025-05-11T08:36:41Z&spr=https&sv=2024-11-04&sr=d&sig=vjCrqWLo3LbmkXwCyIKWtAtFnYO2uBSxEWNgGKbeS00%3D&sdd=3"

# Process images in batches of 100

batch_size = 100

total_images = 2 # 17853 # Adjust this as needed

def get_description(id, image_url):

result = analyze_image_from_sdk(client, image_url)

description = {}

description["id"] = id

# Access the results (e.g., image categories)

if result.caption:

print(f"Caption: {result.caption.text}")

print(f"Caption Confidence: {result.caption.confidence}")

description["caption"] = f"{result.caption.text}"

description["caption_confidence"] = result.caption.confidence

if result.tags:

print("Tags:")

tags = []

for tag in result.tags:

tag = {}

print(f" {tag.name}: {tag.confidence}")

tag["name"] = f"{tag.name}"

tag["confidence"] = f"{tag.confidence}"

tags += [tag]

description["tags"] = tags

if result.objects:

print("Objects:")

objectItems = []

for obj in result.objects:

objectItem = {}

print(f" {obj.name}: {obj.confidence}")

objectItem["name"] = f"{obj.name}"

objectItem["confidence"] = obj.confidence

if obj.bounding_box:

print(f" Bounding Box: {obj.bounding_box}")

objectItem["bounding_box"] = f"{obj.bounding_box}"

objectItems += [objectItem]

description["objects"] = objectItems

return description

#@retry(stop=stop_after_attempt(5), wait=wait_fixed(1))

def vectorize_image(client, blob_url):

headers = {

'Ocp-Apim-Subscription-Key': vision_api_key,

}

params = {

'model-version': '2023-04-15',

'language': 'en'

}

headers['Content-Type'] = 'application/json'

request = HttpRequest(

method="POST",

url=f"/retrieval:vectorizeImage?api-version={vision_api_version}",

json={"url": blob_url},

params=params,

headers=headers

)

response = client.send_request(request)

try:

print(repr(response))

response.raise_for_status()

print(f"vectorize returned {response.json()}")

return response.json()

except HttpResponseError:

print(str(e))

return None

#@retry(stop=stop_after_attempt(5), wait=wait_fixed(1))

def get_image_vector(image_path, key, region):

headers = {

'Ocp-Apim-Subscription-Key': key,

}

params = urllib.parse.urlencode({

'model-version': 'latest',

})

try:

if image_path.startswith(('http://', 'https://')):

headers['Content-Type'] = 'application/json'

body = json.dumps({"url": image_path})

else:

headers['Content-Type'] = 'application/octet-stream'

with open(image_path, "rb") as filehandler:

image_data = filehandler.read()

body = image_data

conn = http.client.HTTPSConnection("img01.cognitiveservices.azure.com", timeout=3)

conn.request("POST", "/retrieval:vectorizeImage?api-version=2023-04-01-preview&%s" % params, body, headers)

response = conn.getresponse()

print(repr(response))

data = json.load(response)

print(repr(data))

conn.close()

if response.status != 200:

raise Exception(f"Error processing image {image_path}: {data.get('message', '')}")

return data.get("vector")

except (requests.exceptions.Timeout, http.client.HTTPException) as e:

print(f"Timeout/Error for {image_path}. Retrying...")

raise

#@retry(stop=stop_after_attempt(5), wait=wait_fixed(1))

def analyze_image(client, blob_url):

headers = {

'Ocp-Apim-Subscription-Key': search_api_key,

}

params = {

'model-version': '2023-04-15',

'language': 'en'

}

headers['Content-Type'] = 'application/json'

request = HttpRequest(

method="POST",

url=f"/computervision/imageanalysis:analyze?api-version={vision_api_version}",

json={"url": blob_url},

params=params,

headers=headers

)

response = client.send_request(request)

try:

response.raise_for_status()

print(f"analyze returned {response.json()}")

return response.json()

except HttpResponseError:

print(str(e))

return None

def analyze_image_from_sdk(client, blob_url):

result = client.analyze(

image_url=blob_url,

visual_features=[

VisualFeatures.TAGS,

VisualFeatures.OBJECTS,

VisualFeatures.CAPTION,

VisualFeatures.DENSE_CAPTIONS,

VisualFeatures.READ,

VisualFeatures.SMART_CROPS,

VisualFeatures.PEOPLE,

], # Mandatory. Select one or more visual features to analyze.

smart_crops_aspect_ratios=[0.9, 1.33], # Optional. Relevant only if SMART_CROPS was specified above.

gender_neutral_caption=True, # Optional. Relevant only if CAPTION or DENSE_CAPTIONS were specified above.

language="en", # Optional. Relevant only if TAGS is specified above. See https://aka.ms/cv-languages for supported languages.

model_version="latest", # Optional. Analysis model version to use. Defaults to "latest".

)

return result

def vectorize_image_from_sdk(client, blob_url):

result = client.vectorize(

image_url=blob_url,

language="en", # Optional. Relevant only if TAGS is specified above. See https://aka.ms/cv-languages for supported languages.

model_version="latest", # Optional. Analysis model version to use. Defaults to "latest".

)

return result

for batch_start in range(1, total_images + 1, batch_size):

vectorized_images = {}

documents = []

# Vectorize 100 images at a time

batch_end = min(batch_start + batch_size, total_images + 1)

for i in range(batch_start, batch_end):

file_name = f"{i:06}"

blob_url = sas_template.format(file=file_name)

try:

#response = get_image_vector(blob_url, vision_api_key, "eastus")

response = vectorize_image(analysis_client, blob_url)

print(repr(response))

if response:

vectorized_images[file_name] = response

documents += [

{"id": file_name, "description": repr(get_description(file_name, sas_template.format(file=file_name))), "vector": response}

]

except Exception as e:

print(f"Error processing {file_name}.jpg: {e}")

print(f"Vectorization complete for images {batch_start} to {min(batch_start + batch_size - 1, total_images)}")

# Upload batch to Azure AI Search

if len(documents) > 0:

# search_client.upload_documents(documents)

print(f"Uploaded {len(documents)} images {batch_start} to {batch_end} to {index_name}.")

print(f"Vectorized images successfully added to {index_name}!")