Cluster computing: April 2025

Wednesday, April 30, 2025

An image processing pipeline can have any number of extensions or operators. It is not limited to the proprietary models or techniques. In fact, if there are locations that you already have captured images and have labeled the objects of interest, you can plug-n-play your model for processing the next round of images say from the UAV swarm flight which will prioritize your predictions in the test flight and route autonomously. This widens the strategy and purpose of developing applications that can leverage this pipeline for their specific use cases. Objects detected using the Bring-Your-Own-Device processor can still be registered to a world catalog.

As an example, some preprocessing of the drone images with a dataset is based on 512x512 resolution images of highways and annotated in the Pascal VoC format, could leverage the following transform

1. Filters using kernels. A kernel is any matrix A, that when multiplied by another matrix B, transforms B in a way that highlights a certain feature. Finding features in images can be helpful to classificatio

2. CNN: A Convolutional Neural Network that takes an image and produces a vector based on embeddings that it derived from its training. Most Landing.AI experiments with images leverage this technique. It applies different kernels across the image and constantly improves these kernels using gradient descent. MobileNet is an example model suitable for drone imageries. Another example is YOLOv3 and we sourced most of the runti

3. LSTM: also called Long Short-Term Memory Neural network uses previous predictions and occurrences as a basis for predicting current input. This helps with temporal information such as movemen

4. Augmentation: Certain shifts, jubilations and rotations to images as part of preprocessing before CNN would be covered in this operator and this can be a great way to normalize all the input images to a common standar

5. Gaussian Blurring: is a kernel that can be applied across the image to balance the pixel around its neighbors and thereby make transition smoother. A 5x5 pixel with a standard deviation of 2 could be an example blurring kerne

6. Edge detection: come very helpful to detecting road boundaries which in turn can help analyze a variety of drone imageries and yield useful information. Canny is one such edge detection algorithm, but you can bring your ow

7. Heat-map: a variety of probability functions can be used to create a probability map of the image in color coding or gray scale so that lighter are areas of importance and darker regions are less importan

t.n.l.d.t.men.s:

Tuesday, April 29, 2025

Multimodal image search

The following code snippet describes how multimodal search can come useful to search images. The images are indexed and searched based on vector embeddings but the query is text based.

from dotenv import load_dotenv,dotenv_values

import json

import os

import requests

from tenacity import retry, stop_after_attempt, wait_fixed

from dotenv import load_dotenv

from azure.core.credentials import AzureKeyCredential

from azure.identity import DefaultAzureCredential

from azure.search.documents import SearchClient

from azure.search.documents.indexes import SearchIndexClient

from azure.search.documents.models import (

RawVectorQuery,

)

from azure.search.documents.indexes.models import (

ExhaustiveKnnParameters,

ExhaustiveKnnVectorSearchAlgorithmConfiguration,

HnswParameters,

HnswVectorSearchAlgorithmConfiguration,

SimpleField,

SearchField,

SearchFieldDataType,

SearchIndex,

VectorSearch,

VectorSearchAlgorithmKind,

VectorSearchProfile,

)

from IPython.display import Image, display

load_dotenv()

service_endpoint = os.getenv("AZURE_SEARCH_SERVICE_ENDPOINT")

index_name = os.getenv("AZURE_SEARCH_INDEX_NAME")

api_version = os.getenv("AZURE_SEARCH_API_VERSION")

key = os.getenv("AZURE_SEARCH_ADMIN_KEY")

aiVisionApiKey = os.getenv("AZURE_AI_VISION_API_KEY")

aiVisionRegion = os.getenv("AZURE_AI_VISION_REGION")

aiVisionEndpoint = os.getenv("AZURE_AI_VISION_ENDPOINT")

credential = AzureKeyCredential(key)

search_client = SearchClient(endpoint=service_endpoint, index_name=index_name, credential=credential)

query_image_path = "images/PIC01.jpeg"

@retry(stop=stop_after_attempt(5), wait=wait_fixed(1))

def get_image_vector(image_path, key, region):

headers = {

'Ocp-Apim-Subscription-Key': key,

}

params = urllib.parse.urlencode({

'model-version': '2023-04-15',

})

try:

if image_path.startswith(('http://', 'https://')):

headers['Content-Type'] = 'application/json'

body = json.dumps({"url": image_path})

else:

headers['Content-Type'] = 'application/octet-stream'

with open(image_path, "rb") as filehandler:

image_data = filehandler.read()

body = image_data

conn = http.client.HTTPSConnection(f'{region}.api.cognitive.microsoft.com', timeout=3)

conn.request("POST", "/computervision/retrieval:vectorizeImage?api-version=2023-04-01-preview&%s" % params, body, headers)

response = conn.getresponse()

data = json.load(response)

conn.close()

if response.status != 200:

raise Exception(f"Error processing image {image_path}: {data.get('message', '')}")

return data.get("vector")

except (requests.exceptions.Timeout, http.client.HTTPException) as e:

print(f"Timeout/Error for {image_path}. Retrying...")

raise

vector_query = RawVectorQuery(vector=get_image_vector(query_image_path,

aiVisionApiKey,

aiVisionRegion),

k=3,

fields="image_vector")

def generate_embeddings(text, aiVisionEndpoint, aiVisionApiKey):

url = f"{aiVisionEndpoint}/computervision/retrieval:vectorizeText"

params = {

"api-version": "2023-02-01-preview"

}

headers = {

"Content-Type": "application/json",

"Ocp-Apim-Subscription-Key": aiVisionApiKey

}

data = {

"text": text

}

response = requests.post(url, params=params, headers=headers, json=data)

if response.status_code == 200:

embeddings = response.json()["vector"]

return embeddings

else:

print(f"Error: {response.status_code} - {response.text}")

return None

query = "farm"

vector_text = generate_embeddings(query, aiVisionEndpoint, aiVisionApiKey)

vector_query = RawVectorQuery(vector=vector_text,

k=3,

fields="image_vector")

# Perform vector search

results = search_client.search(

search_text=query,

vector_queries= [vector_query],

select=["description"]

)

for result in results:

print(f"{result['description']}")

display(Image(DIR_PATH + "/images/" + result["description"]))

print("\n")

Monday, April 28, 2025

Image processing is made easy with platforms like landing.ai

As an example, the following is an application that counts cars in drone images. The dataset is based on 512x512 resolution images of highways and is annotated in the Pascal VoC format. The model is hosted and usable with a sample web-request as follows:

from PIL import Image

from landingai.predict import Predictor

# Enter your API Key

endpoint_id = "11cb6c44-3b6a-4b47-bac9-031826bc80ea"

api_key = "YOUR_API_KEY"

# Load your image

image = Image.open("image.png")

# Run inference

predictor = Predictor(endpoint_id, api_key=api_key)

predictions = predictor.predict(image)

And it can even be requested with agentic ai framework as follows:

import requests

url = "https://api.va.landing.ai/v1/tools/agentic-object-detection"

files = {

"image": open("{{path_to_image}}", "rb")

}

data = {

"prompts": "{{prompt}}",

"model": "agentic"

}

headers = {

"Authorization": "Basic {{your_api_key}}"

}

response = requests.post(url, files=files, data=data, headers=headers)

print(response.json())

For context on DFCS drone video sensing platform, please check the references.

Sunday, April 27, 2025

Some more illustrations for drone imagery processing:

def stable_groups(keypoints, groups, threshold):

for kp in keypoints:

matched = false

for group in groups:

mean_feature = get_mean_feature(group)

recent_pixel = get_recent_pixel(group)

if kd.feature - mean_feature < threshold and abs(lucas_kanade_optical_flow(recent_pixel)-kp.pixel) < threshold:

group.add(kp)

matched = true

break

if matched == false:

groups.add(create_group(kp))

def global_groups(stable_groups,global_groups, threshold):

for stable_group in stable_groups:

matched = false

for global_group in global_groups:

mean_feature = get_mean_feature(global_group)

recent_pixel = get_recent_pixel(global_group)

if get_mean_feature(stable_group) - mean_feature < threshold and delta_least_squares(stable_group,global_group):

global_group.add(stable_group)

matched = true

break

if matched == false:

global_groups.add(create_global_group(stable_group))

def spherical_gps_to_position_n_orientation(gps, frame):

return (d,x,d,y,h)

def camera_angle(keypoint, resolutionW, resolutionH, field_of_view):

return arctan((x1 x (tan(field_of_view)/2) ) / (W/2)

def world_coordinates(keypoint, drone_frame):

# solve these equations

# 1. (di.h - si.h).tan-theta-i-x = s.x - di.x,

# 2. (di.h - si.h).tan-theta-i-y = s.y - di.y

# return (s.x,s.y, s.h)

Saturday, April 26, 2025

This is illustration for sift feature extraction:

import cv2

sift = cv2.xfeatures2d.SIFT_create()

def compute_one(im):

return sift.detectAndCompute(im, None)

def compute_sift(frames):

print('get sift features')

sift_features = [(None, None) for _ in frames]

for frame_idx, im in enumerate(frames):

if im is None or frame_idx % 3 != 0:

continue

print('... sift {}/{}'.format(frame_idx, len(frames)))

keypoints, descs = compute_one(im)

sift_features[frame_idx] = (keypoints, descs)

return sift_features

Friday, April 25, 2025

Drone Imagery Processing

We mentioned the drone video sensing platform DFCS to comprise of an image processor, an analytical engine and a drone router where the vision processor creates vectors for KeyPoint that are a tuple of pixel position and feature descriptor of the patch around the pixel which translates to world co-ordinates and time lapse information of that location. This article explains some of the tenets of the image processor.

One of the main requirements of the image processor is fast-frame alignment. Given that the images could be from any one of the units of the UAV swarm and from any position, the alignment of video frames is essential for subsequent tasks such as object detection and change-tracking. These three tasks are completed with the help of operators in an image pipeline fed with images from the drones’ sensors. The first flight around the region input by the user itself provides most of the survey of the landscape and brings in images from various vantage points. Most of the images are top-down imagery from this first video.

The frame alignment computes a mapping from each pixel to world-coordinates (longitude-latitude-height). The object detection and change-tracking encode the structured information obtained from the images. Machine Learning models extract information from the video. Frame alignment efficiently combines GPS and compass readings with image features. There is no need to compute or stash intermediary or output images from this processing. SIFT feature extraction derives KeyPoint in each video frame. Then KeyPoint are grouped together to describe the same world location such as a road divider or a chimney in two phases. Grouping involves creating stable groups in KeyPoint from multiple top-down images in a segment of the video from an aerial flight over the world location and then using that to create global groups by merging stable groups that describe the same world location. This inevitably leads to consolidation of all KeyPoint pertaining to a world location. Then the video frame is aligned by matching the SIFT KeyPoint computed in a single frame against the global groups, and this matching is used to estimate the drone’s position and orientation when it captured the frame. SIFT yields KeyPoint, frame alignment yields position and orientation and grouping yields KeyPoint corresponding to same world location. Grouping is iterative and initially starts with an empty set. For each frame, a KeyPoint is attempted to be matched with an existing group based on two conditions: 1. the similarity of the KeyPoint descriptor and the mean across descriptors in a group must lie below a threshold and 2. the pixel position of the most recent KeyPoint in the group when transformed via optical flow must fall close to that of the KeyPoint within a small threshold. Closeness is measured by Euclidean distance and the transformation is done with Lucas-Kanade method. If there is no match, the KeyPoint becomes a new group with a singleton member. Both existing and new groups are added to the global group.

After this aggregation into groups, GPS and compass readings are used to determine the world co-ordinates of stable groups. To merge stable groups into global groups, the co-ordinates of the global group is computed as the average across those of the stable groups and replace the optical flow constraint with the position estimate similarity constraint using the criteria of least-squares error to be below a threshold.

Thursday, April 24, 2025

Leveraging a database of objects detected with Standard Query Operators to build rich drone video sensing applications.

We mentioned the drone video sensing platform DFCS to comprise of a vision processor, an analytical engine and a drone router where the vision processor creates vectors for keypoints that are a tuple of pixel position and feature descriptor of the patch around the pixel which translates to world co-ordinates and time lapse information of that location. While many of the questions can directly be answered with a search on this vector database or with multimodal search directly on the selected frames, we also leverage RAG by creating a database of detected objects which comes in very useful to search with the public reviews of those objects such as say parking spaces from the internet. The aim of the product database as a regular structured data source of all detected objects is that we can now leverage standard query operators to build rich uav swarm sensing applications.

For example,

-- My Position

declare @myposition geography = geography::STGeomFromText('POINT(-0.2173896258649289, 51.484376146936256)' 4326)

-- Get Embeddings from OpenAI

declare @e varbinary(8000);

exec dbo.get_embeddings

@model = 'text-embedding-3-small'

@text = 'a place to park a car on Thursday 1-3 pm GMT',

@embedding = @e output;

with cte as

(

select

e.review_id,

vector_distance('cosine', embedding, @e) as distance

from

dbo.review_embeddings e

)

select top(10)

b.id as business_id,

b.name,

r.id as review_id,

r.stars,

@myposition.STDistance(geo_location) as geo_distance,

1-e.distance as similarity

from

cte e

inner join

dbo.reviews r on e.review_id = r.id

inner join

dbo.business b on r.business_id = b.id

where

b.city = 'London'

and

@myposition.STDistance(geo_location) < 5000 -- 5 km

and

regexp_like(cast(b.categories as varchar(1000)), 'Parking|Street')

and

r.stars >= 4

and

b.reviews > 30

and

json_value(b.custom_attributes, '$."metered"') = 'yes'

order by

distance

The above direct SQL query on the database combined with built-in vector search allows a traditional web application to be created or the application can query a chatbot with a system message as “You are an AI assistant that helps people find parking. Give as many details as possible about each parking space such as price. Whenever you respond, please format your answer to make it readable including bullet points.” to define the AI's personality, tone and capabilities and leverage the detected objects database for Retrieval Augmented Generation.

Wednesday, April 23, 2025

Waypoint selection strategies

The design, development and test of the waypoint selection and trajectory forming algorithm was discussed with the assumption that the users provide a geographic region that they are interested in observing. The region is then divided into a grid of cells under a user configurable cell size. Then acquiring information on the reachability of cells from one another, we create a graph represented with cells as nodes and the adjacencies as edges. This helps us determine waypoints as the set of nodes to select in a topographical sort between source to destination. One of the helper libraries for the implementation, therefore, involves the following graph object.

class Vertex(object):

def __init__(self, id, point):

self.id = id

self.point = point

self.in_edges = []

self.out_edges = []

def _neighbors(self):

n = {}

for edge in self.in_edges:

n[edge.src] = edge

for edge in self.out_edges:

n[edge.dst] = edge

return n

def neighbors(self):

return self._neighbors().keys()

def __repr__(self):

return 'Vertex({}, {}, {} in {} out)'.format(self.id, self.point, len(self.in_edges), len(self.out_edges))

class Edge(object):

def __init__(self, id, src, dst):

self.id = id

self.src = src

self.dst = dst

def bounds(self):

return self.src.point.bounds().extend(self.dst.point)

def segment(self):

return geom.Segment(self.src.point, self.dst.point)

def closest_pos(self, point):

p = self.segment().project(point)

return EdgePos(self, p.distance(self.src.point))

def is_opposite(self, edge):

return edge.src == self.dst and edge.dst == self.src

def get_opposite_edge(self):

for edge in self.dst.out_edges:

if self.is_opposite(edge):

return edge

return None

def is_adjacent(self, edge):

return edge.src == self.src or edge.src == self.dst or edge.dst == self.src or edge.dst == self.dst

def orig_id(self):

if hasattr(self, 'orig_edge_id'):

return self.orig_edge_id

else:

return self.id

Tuesday, April 22, 2025

SIFT feature extraction for drone imageries

SIFT, or Scale-Invariant Feature Transform, is a powerful algorithm used in computer vision for detecting, describing, and matching local features in images. SIFT is designed to identify features that remain consistent across changes in scale, rotation, and illumination. It is applied to drone imageries to compute keypoints in each video frame. A keypoint is a tuple of a pixel position and a feature descriptor that describes the image in a patch around that pixel - a vector representation of the local image region. SIFT matches features between images by comparing their descriptors using metrics like Euclidean distance. For every video frame, SIFT yields a set of keypoints.

The implementation to get sift features is as follows:

import cv2

sift = cv2.xfeatures2d.SIFT_create()

def compute_one(im):

return sift.detectAndCompute(im, None)

def compute_sift(frames):

print('get sift features')

sift_features = [(None, None) for _ in frames]

for frame_idx, im in enumerate(frames):

if im is None or frame_idx % 3 != 0:

continue

print('... sift {}/{}'.format(frame_idx, len(frames)))

keypoints, descs = compute_one(im)

sift_features[frame_idx] = (keypoints, descs)

return sift_features

Monday, April 21, 2025

Multimodal image search

The following code snippet describes how multimodal search can come useful to search images. The images are indexed and searched based on vector embeddings but the query is text based.

from dotenv import load_dotenv,dotenv_values

import json

import os

import requests

from tenacity import retry, stop_after_attempt, wait_fixed

from dotenv import load_dotenv

from azure.core.credentials import AzureKeyCredential

from azure.identity import DefaultAzureCredential

from azure.search.documents import SearchClient

from azure.search.documents.indexes import SearchIndexClient

from azure.search.documents.models import (

RawVectorQuery,

)

from azure.search.documents.indexes.models import (

ExhaustiveKnnParameters,

ExhaustiveKnnVectorSearchAlgorithmConfiguration,

HnswParameters,

HnswVectorSearchAlgorithmConfiguration,

SimpleField,

SearchField,

SearchFieldDataType,

SearchIndex,

VectorSearch,

VectorSearchAlgorithmKind,

VectorSearchProfile,

)

from IPython.display import Image, display

load_dotenv()

service_endpoint = os.getenv("AZURE_SEARCH_SERVICE_ENDPOINT")

index_name = os.getenv("AZURE_SEARCH_INDEX_NAME")

api_version = os.getenv("AZURE_SEARCH_API_VERSION")

key = os.getenv("AZURE_SEARCH_ADMIN_KEY")

aiVisionApiKey = os.getenv("AZURE_AI_VISION_API_KEY")

aiVisionRegion = os.getenv("AZURE_AI_VISION_REGION")

aiVisionEndpoint = os.getenv("AZURE_AI_VISION_ENDPOINT")

credential = AzureKeyCredential(key)

search_client = SearchClient(endpoint=service_endpoint, index_name=index_name, credential=credential)

query_image_path = "images/PIC01.jpeg"

@retry(stop=stop_after_attempt(5), wait=wait_fixed(1))

def get_image_vector(image_path, key, region):

headers = {

'Ocp-Apim-Subscription-Key': key,

}

params = urllib.parse.urlencode({

'model-version': '2023-04-15',

})

try:

if image_path.startswith(('http://', 'https://')):

headers['Content-Type'] = 'application/json'

body = json.dumps({"url": image_path})

else:

headers['Content-Type'] = 'application/octet-stream'

with open(image_path, "rb") as filehandler:

image_data = filehandler.read()

body = image_data

conn = http.client.HTTPSConnection(f'{region}.api.cognitive.microsoft.com', timeout=3)

conn.request("POST", "/computervision/retrieval:vectorizeImage?api-version=2023-04-01-preview&%s" % params, body, headers)

response = conn.getresponse()

data = json.load(response)

conn.close()

if response.status != 200:

raise Exception(f"Error processing image {image_path}: {data.get('message', '')}")

return data.get("vector")

except (requests.exceptions.Timeout, http.client.HTTPException) as e:

print(f"Timeout/Error for {image_path}. Retrying...")

raise

vector_query = RawVectorQuery(vector=get_image_vector(query_image_path,

aiVisionApiKey,

aiVisionRegion),

k=3,

fields="image_vector")

def generate_embeddings(text, aiVisionEndpoint, aiVisionApiKey):

url = f"{aiVisionEndpoint}/computervision/retrieval:vectorizeText"

params = {

"api-version": "2023-02-01-preview"

}

headers = {

"Content-Type": "application/json",

"Ocp-Apim-Subscription-Key": aiVisionApiKey

}

data = {

"text": text

}

response = requests.post(url, params=params, headers=headers, json=data)

if response.status_code == 200:

embeddings = response.json()["vector"]

return embeddings

else:

print(f"Error: {response.status_code} - {response.text}")

return None

query = "farm"

vector_text = generate_embeddings(query, aiVisionEndpoint, aiVisionApiKey)

vector_query = RawVectorQuery(vector=vector_text,

k=3,

fields="image_vector")

# Perform vector search

results = search_client.search(

search_text=query,

vector_queries= [vector_query],

select=["description"]

)

for result in results:

print(f"{result['description']}")

display(Image(DIR_PATH + "/images/" + result["description"]))

print("\n")

Sunday, April 20, 2025

Continuous indexing

Azure AI Search supports continuous indexing of documents, enabling real-time updates to the search index as new data is ingested. It can connect to various data sources, such as Azure Blob Storage, SQL databases, or Cosmos DB, to ingest documents continuously. Indexers are configured to monitor these sources for changes and update the search index accordingly. The indexer scans the data source for new, updated, or deleted documents. The time taken to index new documents depends on factors like the size of the data, complexity of the schema, and the indexing tier. For large datasets, indexing may take longer, especially if the indexer is resource starved. Once documents are indexed, they are available for querying. However, query latency can vary based on the size of the index, query complexity, and service tier. The minimum interval for indexer runs is 5 minutes. If this pull from data source is not sufficiently fast enough, individual data item can be indexed by directly pushing to index using the index client. Both these are shown via code samples below:

from azure.identity import DefaultAzureCredential

from azure.mgmt.search import SearchManagementClient

Replace with your Azure credentials and configuration

subscription_id = ""

resource_group_name = ""

search_service_name = ""

blob_storage_account_name = ""

blob_container_name = ""

connection_string = ""

Authenticate using DefaultAzureCredential

credential = DefaultAzureCredential()

Initialize the Azure Search Management Client

search_client = SearchManagementClient(credential, subscription_id)

Define the data source

data_source_name = "blob-data-source"

data_source_definition = {

type": "AzureBlob",

credentials": {

connectionString": connection_string

container": { name": blob_container_name } }

Create or update the data source in Azure Search

search_client.data_sources.create_or_update( resource_group_name=resource_group_name, search_service_name=search_service_name,

data_source_name=data_source_name,

data_source=data_source_definition )

Define the index

index_name = "blob-index"

index_definition =

{

fields": [

{"name": "id", "type": "Edm.String", "key": True},

{"name": "content", "type": "Edm.String"},

{"name": "category", "type": "Edm.String"},

{"name": "sourcefile", "type": "Edm.String"},

{"name": "metadata_storage_name", "type": "Edm.String"} ] }

Create or update the index

search_client.indexes.create_or_update(

resource_group_name=resource_group_name, search_service_name=search_service_name,

index_name=index_name,

index=index_definition )

Define the indexer

indexer_name = "blob-indexer"

indexer_definition = {

dataSourceName": data_source_name,

targetIndexName": index_name,

schedule":

{

interval": "PT5M" # Run every 5 minutes

} }

Create or update the indexer

search_client.indexers.create_or_update( resource_group_name=resource_group_name, search_service_name=search_service_name, indexer_name=indexer_name, indexer=indexer_definition )

print("Configured continuous indexing from Azure Blob Storage to Azure AI Search!")

Replace with your Azure credentials and configuration

service_name = ""

admin_key = ""

Initialize the SearchIndexClient

endpoint = f"https://{service_name}.search.windows.net/"

credential = AzureKeyCredential(admin_key)

index_client = SearchIndexClient(endpoint=endpoint, credential=credential)

Upload documents to index:

def index_document(filename):

print(f"Indexing document '{filename}' into search index '{index_name}'")

search_client = SearchClient(endpoint=f"https://{searchservice}.search.windows.net/", index_name=index, credential=search_creds)

batch = []

with open(filename, 'r') as fin:

text = fin.read()

batch += [text]

if len(batch) > 0:

results = search_client.upload_documents(documents=batch)

succeeded = sum([1 for r in results if r.succeeded])

print(f"\tIndexed {len(results)} documents, {succeeded} succeeded")

The default rate limit for adding documents to the index varies with service tiers, replicas and partitions. Higher service tiers have higher rate limit. Adding replicas increases query throughput. Adding partitions increases indexing throughput. 1000 documents can be sent in a batch, and batching optimizes throughput and reduces the likelihood of hitting rate limits.

Saturday, April 19, 2025

How the DFCS differs from SkyQuery platform?

DFCS is a UAV swarm imagery driven knowledge base and analytics stack based entirely in the public cloud that can be used to create a trajectory involving waypoints from source to destination over a given landscape. The capabilities to store and query drone imageries for information that can be used to build a knowledge base for retrieval augmented generation in AI applications is quite generic and has many requirements like a wide variety of image querying systems. Most notably, SkyQuery, platform also has similar requirements to deal with a large dataset of images and to provide contextual information on queries. SkyView is an aerial drone video sensing platform with a high-level programming language that makes it quite suitable for developing long-running sensing applications. SkyView performs with fast video frame alignment and detection of small objects which works well for querying with its expressive domain specific language in which programs specify sensing-analytics-routing loops. It also provides a library of analytical operators to encode these steps. By separating out workflows that can be written using these operators, it allows takeoff, waypoint following and landing to be automated.

Therefore, both DFCS and SkyQuery provide computer vision pipelines and processors to convert drone video data into queryable representations, a way to contextualize queries along with an engine that provides fast responses suitable for use to provide routing directives to UAV swarm and all these with the help of programmable interfaces.

But the differences are in the use of representations for these datasets and the way they are queried. DFCS leverages AI and vector search while SkyQuery leverages language constructs. Even image processors are multimodal for DFCS while SkyQuery leverages cataloguing of output from SIFT feature extractors. The use of Retrieval-augmented-generation in queries makes the query results more meaningful for DFCS while SkyQuery requires workflows to experiments with their own querying logic. Objects are referred to with Keypoints comprising of pixel positions and a feature descriptor that are then formed into “stable groups” with SkyQuery. DFCS, on the other hand, leverages vector search that work well with contextual information presented via spatial co-ordinates, progress along waypoints and error corrections.

It could be said that the DFCS focuses more on the flight path of the UAV swarm and provides error correction feedback to let the swarm remain on course to its destination. It bolsters this with information for humans as well as feedback loops for autonomous flights and comes with Telemetry pipelines that continuously indicate manner and measure of progress along the trajectory.

By separating the cataloguing, grouping and querying of objects to remain independent of the vector representations, DFCS facilitates working with third party datastores including those that were built to be product catalogs. This help to diversify the method and means of querying for different purposes and not be restricted to leverage only one form of language. DFCS is polyglot and provides a chatbot like interface that leverages the state of the union in Retrieval Augmented Generation.

#codingexercise: https://1drv.ms/w/c/d609fb70e39b65c8/Echlm-Nw-wkggNb7JAEAAAABu53rpIuTS5AsMb3lNiM7SQ?e=u6kTma

Friday, April 18, 2025

Telemetry pipelines

Collected and emitted telemetry data makes data ingestion and processing of sensor data independent of the input for the models used to predict the next orientation. This strategy leans on telemetry pipelines as an effective technology to solve data problems and turn expansive datasets into concise actionable insights without losing information. Waypoints, trajectory, position on the trajectory, deviations and error corrections are all that is needed, maintained and tracked for the UAV swarm to negotiate the obstacles and stay on course to reach the destination from the source. An intelligent telemetry pipeline will demonstrate these five-step approach to maximizing its value:

1. Noise filtering: This involves sifting through data to spotlight the essentials.

2. Long-Term data retention: this involves safeguarding valuable data for future use

3. Event-trimming: This tailors data for optimal analytics so that the raw data is not dictating eccentricities in the charts and graphs.

4. Data condensation: this translates voluminous MELT data into focused metrics

5. Operational Efficiency Boosting: This amplifies operating speed and reliability.

This approach is widely applicable across domains and is also visible in many projects that span Kaggle datasets, open source such as GitHub, and many publications. Emitting to an S3 or S3 compatible storage and calculating number and size of emitted events indicates the reduction in size compared to original data and as a measure of effectiveness in using telemetry instead of actual data.

With the metrics emitted for drones, the first step of noise filtering involves removing duplicates, false positives, recurring notifications and superfluous information while registering their frequency for future use. Dissecting data within specific windows, keeping unique events and eliminating excessive repetitions can be offloaded to a dedupe processor but this step is not limited to that and strives to keep the data as precise and concise as required to not lose information and still be good enough for the same analytics.

Specific datasets and SIEM are indispensable for future needs and with real-time data refinement requirements. So, leveraging cloud architecture patterns that write to multiple destinations while collecting data from multiple sources such as a service bus is a requisite for the second stage. This step could also implement filtering capabilities and journaling that ensures robustness and reliability and without loss of fidelity.

The third step is a take on advanced telemetry management with the introduction of concepts like Traffic flow segregation such as with grouping and streamlining. It does involve parsing but it improves overall performance. Deeper analysis is often better with some transformations

The fourth step for data condensation builds on the concept of refinement that proactively prevents another instance of data deluge so that even streams are manageable and meaningful. The value extends beyond volume reduction as this approach reduces data processing overheads.

The fifth step is about managing the data and ensuring the speed and reliability of operations that process this data. With increasing ingestion rates, vectorization and search may lag. Agile robust solutions that maximize the value derived from their data while making costs manageable are required here.

Data accumulation without purposeful action leads to stagnation and efficient operations aid streamlining and refining data. Speed and reliability is a function of both

Thursday, April 17, 2025

Always pertinent:

Problem: determine if a graph has cycles:

import java.util.*;

import java.lang.*;

import java.io.*;

class Ideone

{

public static void main (String[] args) throws java.lang.Exception

{

int[][] m = new int[5][5]();

for (int i = 0; i < m.length; i++) {

for (int j = 0; j < m[0].length; j++) {

m[i][j] = 0;

}

m[0][1] = 1;

m[0][2] = 1;

m[1][0] = 1;

m[1][3] = 1;

m[2][0] = 1;

m[2][3] = 1;

m[3][1] = 1;

m[3][2] = 1;

m[3][4] = 1;

m[4][3] = 1;

var vertices = InitializeSingleSource(m, 0);

var edges = new HashMap<String, String>();

edges.put("A", "B");

edges.put("A", "C");

edges.put("B", "A");

edges.put("B", "D");

edges.put("C", "A");

edges.put("C", "D");

edges.put("D", "B");

edges.put("D", "C");

edges.put("D", "E");

edges.put("E", "D");

System.out.println(hasCyclesByBellmanFord(vertices, edges));

}

private static List<Vertex> InitializeSingleSource(int[][] m, int start) {

var vertices = new ArrayList<Vertex>();

for (int i = 0; i < m.length; i++){

var v = new Vertex();

v.id = String.valueOf(Character.valueOf('A' + i));

v.d = Integer.MAX_VALUE;

if (i == start) { v.d = 0; }

v.parent = null;

}

return vertices

}

private static Vertex getVertex(List<Vertex> vertices, String id){

for (int i = 0; i < vertices.size(); i++){

if (vertices.get(i).id.equals(id)){

return vertices.get(i);

}

return null;

}

// A ->C <-D ->E

// ->B->

private static boolean hasCyclesByBellmanFord(List<Vertex> vertices, Map<String, String> edgeMap) {

boolean result = false;

for (int i = 0; i < vertices.length; i++){

for(var entry: edgeMap.entrySet()) {

var u = getVertex(entry.getKey());

var v = getVertex(entry.getValue());

relax(u, v);

}

for (var entry: edgeMap.entrySet()) {

var u = getVertex(entry.getKey());

var v = getVertex(entry.getValue());

if (u != null &&

v != null &&

v.d > u.d + 1) {

result = true;

return result;

}

return result;

}

private static void Relax(Vertex u, Vertex v) {

if (u == null || v == null) { return; }

if (v.d > u.d + 1) {

v.d = u.d + 1;

v.parent = u;

}

class Vertex {

public String id;

public Vertex parent;

public Integer d;

}

Wednesday, April 16, 2025

Secure-by-design

The boundary between infrastructure and application engineering is one where the concerns for security are played differently. Application engineering focuses on architecture with boundaries such that the some of the resources are considered within trust boundary. The infrastructure engineering implements security with network perimeter protection and defense-in-depth strategy such that even the internal or hidden-from-world resources enjoy a certain level of protection. Both sides cannot deny the need to upskill and leverage the tools available. Developers must be trained in secure coding and infrastructure engineers must hold them true to it. Proficiency in using approved tools and establishing and maintaining effective oversight and administration go hand in hand.

With a sprawl in digital landscape of resources. tools, frameworks and platforms used to host data and run code, organizations often find it hard to benchmark their security against industry standards. “Embracing security and resilience by design” is indeed a challenge and progress to meet it must be tracked. CISA has published pivotal guidelines on the subject. One of the techniques frequently used is the Secure Code Warrior’s “Trust Score” technology which opens a new frontier of actionable security insights and benchmarking.

Cybersecurity is a discipline, and it is dynamic. While it was a $2 billion industry in the 90’s dominated by transaction systems, it is now over $2 trillion with an insatiable demand for products, services and AI applications. Virtually every company writes code in some way. Organizations have grappled with bringing security upfront into SDLC amidst cultural resistance and disagreements while having AppSec professionals deplete with a high rate of burnout. Movements like “shift left” have attempted to correct this. But accountability continues to be a sticking point at all levels and scopes. Case in point is the CrowdStrike introduced defect that affected Airlines industry. Oversight and management of software development processes must ensure Secure-by-Design is front-of-mind and achievable for each deployment.

Some of the tenets include: “Provide secure defaults for developers” with the default route during software development as one that is “paved road” or “well-lit path” and “Foster a software developer workforce that understands security” by training them on the best practices and including security education into the hired skillset. Developers need to be enabled through continuous precision learning pathways and tools to suit their tech stack and to share the responsibility for security.

Tuesday, April 15, 2025

This is a summary of the book titled “The Yellow Pad” written by Robert E. Rubin and published by Penguin Press in 2023. The author is a former Goldman Sachs Executive and US Secretary of the Treasury who discusses how to navigate difficult, controversial decisions saying that he learned it requires adhering to specific principles. He also found that one must always recognize the unpredictable human element, since no event ever unfolds exactly as planned. He brings his knowledge and experience to this framework of principles. A prisoner’s insights made a lasting impact on him. In his work, he learned that risk management demands acknowledging the remotest possibilities. Mental toughness enables strong leaders to overcome bumps in the road and great leaders are curious, authentic, and true to their beliefs. A retrospective view provides clarity and fosters more effective decision-making.

He was impressed by the forthrightness of prisoners about their crimes and the importance of pause, assessment, and weighing the possible repercussions of their actions. He believed that making informed decisions amid intense upheaval requires special skill and discipline. Rubin emphasized the importance of understanding one's emotional biases and regulating them during risky crunch times. He also compared the late 20th century to the time when Vice President Al Gore warned of the dangers of global warming. Rubin and Henry Paulson, former treasury secretary under President George W. Bush, approached the Securities and Exchange Commission to advocate for financial establishments to openly acknowledge the possible costs of global warming. They believed that if more leaders around the world had thought about the issue like Al Gore, the world today would be a safer place.

Mental toughness is crucial for strong leaders to overcome challenges and maintain confidence in their decision-making abilities. They are resilient and embrace optimism, which is important within organizations. Successful leaders are known for their resilience, especially in response to public criticism. They are also known for their energetic curiosity, which leads them to take a skeptical approach and search for answers beyond obvious conclusions. Authenticity is a character trait that serves leaders well, as it allows them to explore the world around them. Being true to oneself requires consistency, even when it means disagreeing with others' opinions. Traditional management often overlooks the human element, as seen in the case of Lawrence Bossidy's management philosophy. Rubin prefers to focus on people's individuality and uniqueness, rather than specific rules and detailed lists of do's and don'ts. Despite their skepticism, leaders like Rubin can be talented and perceptive, making them valuable assets in their organizations.

Good executives prioritize the best interests of their organizations, ignoring personal feelings and fostering empathetic and patient decision-making. They credit their employees for department successes, accept blame when things go wrong, and are open to feedback from everyone. Organizational culture influences employee success, and leaders must avoid deviating from their foundational values. Success accrues upward, reflecting well on the leader. Analyzing past actions helps make informed decisions moving forward. In some cases, carefully considered decisions can still generate negative results. Intellectual openness creates an environment where people can work with leaders to make the best decisions. However, organizations often assign blame, leaving employees without valuable input. Chastising, blaming, or unfairly punishing people for making honest mistakes can fuel an unhealthy culture. Rubin seldom states his negative judgments of anyone's actions in public, as long as the people who made poor decisions undertake unflinching reviews of how their actions led to unsatisfactory outcomes.

Monday, April 14, 2025

Comparision between CNN-LSTM and Logistic Regression

Deep Learning has shown superior performance in object detection and image classification in drone imageries. Logistic Regression shows superior prediction with drone telemetry data. While they serve different purposes, they can be compared on a common use case for predicting deviation from trajectory and compensation based on past orientations and current. The cost function in the CNN-LSTM is a mean squared error. CNN-LSTM uses the vectorized output of edge-detected and gaussian-smoothed images captured sequentially from video to predict the next steering angle of the drone. But the same data can be emitted as telemetry along with additional telemetry from edge detections, trajectory and squared errors and their corresponding vectors can then be run through Logistic Regression as shown below:Sample usage of Logistic Regression:

#! /bin/python

import matplotlib.pyplot as plt

import pandas

import os

here = os.path.dirname(__file__) if "__file__" in locals() else "."

data_file = os.path.join(here, "data", "flight_errors", "data.csv")

data = pandas.read_csv(data_file, sep=",")

# y is the last column and the variable we want to predict. It has a boolean value.

data["y"] = data["y"].astype("category")

print(data.head(2))

print(data.shape)

data["y"] = data["y"].apply(lambda x: 1 if x == 1 else 0)

print(data[["y", "X1"]].groupby("y").count())

try:

from sklearn.model_selection import train_test_split

except ImportError:

from sklearn.cross_validation import train_test_split

train, test = train_test_split(data)

import numpy as np

from microsoftml import rx_fast_trees, rx_predict

features = [c for c in train.columns if c.startswith("X")]

model = rx_fast_trees("y ~ " + "+".join(features), data=train)

pred = rx_predict(model, test, extra_vars_to_write=["y"])

print(pred.head())

from sklearn.metrics import confusion_matrix

conf = confusion_matrix(pred["y"], pred["PredictedLabel"])

print(conf)

def train_test_hyperparameter(trainA, trainB, **hyper):

# Train a model

features = [c for c in train.columns if c.startswith("X")]

model = rx_fast_trees("y ~ " + "+".join(features), data=trainA, verbose=0, **hyper)

pred = rx_predict(model, trainB, extra_vars_to_write=["y"])

conf = confusion_matrix(pred["y"], pred["PredictedLabel"])

return (conf[0,0] + conf[1,1]) / conf.sum()

trainA, trainB = train_test_split(train)

hyper_values = [5, 10, 15, 20, 25, 30, 35, 40, 50, 100, 200]

perfs = []

for val in hyper_values:

acc = train_test_hyperparameter(trainA, trainB, num_leaves=val)

perfs.append(acc)

print("-- Training with hyper={0} performance={1}".format(val, acc))

import matplotlib.pyplot as plt

fig, ax = plt.subplots(1, 1)

ax.plot(hyper_values, perfs, "o-")

ax.set_xlabel("num_leaves")

ax.set_ylabel("% correctly classified")

tries = max(zip(perfs, hyper_values))

print("max={0}".format(tries))

model = rx_fast_trees("y ~ " + "+".join(features), data=train, num_leaves=tries[1])

pred = rx_predict(model, test, extra_vars_to_write=["y"])

conf = confusion_matrix(pred["y"], pred["PredictedLabel"])

print(conf)

Sunday, April 13, 2025

Emerging trends and regulations in UAV swarms

The units in a full-fledged, safe and autonomous swarm are comprised of drones and when the entire swarm is homogeneous, the adherence to Federal Aviation Administration (FAA)'s Small UAS Rule (Part 107) is sufficient to clear. This regulation mandates the following from the drones:

Drone Weight: Must weigh less than 55 pounds, including payload.
Visual Line of Sight (VLOS): The drone must remain within the operator's unaided visual line of sight.
Daylight Operations: Flights are allowed during daylight or twilight (with anti-collision lighting).
Maximum Altitude: Cannot exceed 400 feet above ground level unless within 400 feet of a structure.
Maximum Speed: Limited to 100 mph (87 knots).
Airspace Restrictions: Operations in controlled airspace require prior FAA authorization.
No Flying Over People: Unless they are directly involved in the operation.
No Moving Vehicles: Cannot operate from a moving vehicle unless in a sparsely populated area.
Weather Visibility: Minimum visibility of 3 miles from the control station.
Pilot Certification: Operators must hold a Remote Pilot Certificate or be supervised by someone who does.
Registration: All drones must be registered with the FAA.

For example, Amazon's drone delivery system, known as Prime Air, is designed to deliver packages weighing up to 5 pounds within 30 minutes. The drones are fully electric and incorporate advanced aerospace standards to ensure safety and reliability such as Part 135 Air Carrier Certificate from the FAA as well as the FAA Part 107. The drones are equipped with a sophisticated sense-and-avoid system that enables them to detect and navigate around obstacles, both static (like chimneys) and dynamic (like other aircraft). This system uses proprietary algorithms for object detection and decision-making, ensuring safe operations even in unexpected situations. The algorithms leverage a diverse suite of object detection technologies to identify obstacles and adjust flight paths accordingly. During the delivery descent, the drones can detect and avoid smaller obstacles like trampolines or clotheslines that might not be visible in satellite imagery. An automated drone-management system is being developed to plan the flight paths and ensure there are safe distances between the aircraft and other aircraft in the area, and that all aviation regulations are complied with.

The autonomous drone delivery system features a deep learning autonomous drone model built using CNN-LSTM algorithms. It includes functionalities like online purchasing, drone delivery processing, and real-time location tracking. CNN-LSTM algorithms combine Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks to handle tasks involving spatial and temporal data.CNNs are excellent for extracting spatial features from data, such as images or spectrograms. They use convolutional layers to identify patterns like edges, textures, or shapes. LSTMs are a type of Recurrent Neural Network (RNN) designed to capture temporal dependencies in sequential data. They excel at learning long-term relationships, making them ideal for tasks like time-series analysis or speech recognition.CNN layers process spatial data to extract features. These features are then passed to LSTM layers, which analyze the temporal relationships between them. This combination allows the model to understand both spatial and temporal aspects of the data, making it highly effective for tasks like video analysis, activity recognition, and speech emotion detection. This technique can help with generating textual descriptions of video sequences, identifying actions in a sequence of images and classifying emotions from audio spectrograms.

Amazon's CNN-LSTM predictor makes use of Gaussian and Edge detection preprocessing functions from image processing libraries for Steering Angle Dataset exploration. Yolov3 bounding boxes architecture is used to find bounding boxes of cars, people, and trees in the image dataset. These bounding boxes were used by their probability model to calculate the probability of collision. Weight determination functions were used to calculate the probability of colliding into any given object. A pilot script is used to fly the drone.

Saturday, April 12, 2025

Emerging Trends of AI in Autonomous Business:

The digital business era is maturing, with industry-leading enterprises seeking the next technology-enabled business growth curve. Autonomous business is a style of business partly governed and majority-operated by self-learning software agents, providing smart products and services to machine-customer-prevalent markets in a programmable economy. Executive leaders should factor autonomous business concepts into their long-range business strategy cycle, identify early "land grabs" needed to secure a competitive foothold, and pay attention to the possible arrival of machine customers in markets. The concept of autonomous business is still emerging, and its contours may assume a different market term in the future. It is characterized by a style of business partly governed and majority-operated by self-learning software agents, providing smart products and services to machine-customer-prevalent markets. Examples of autonomous business include fingerprint recognition door locks, people-tracking camera drones, voice assistants and chatbots.

Operating in a programmable economy involves organizations trading with customers and other entities via blockchain decentralized ledgers, using smart contracts and digital tokens for value exchanges. This evolution of autonomous business will not be fundamentally dehumanizing, but it will lead to a four-day workweek in advanced economies, but not mass unemployment and societal crises. The definition of autonomous business is indicative rather than absolute, and it follows from prior evolutionary stages of digital and information-technology-enabled business capability and strategic value focus. Autonomous business builds on the prior phases, which will continue to grow and add value, even if their progress rate slows as autonomous business matures. It will rely heavily on golden thread business technology capacities, such as composability, that have helped weave the previous eras and continue to evolve and advance.

The next era, "metaversal business," is expected to emerge from the integration of people into cyberspace, blurring the boundary between humans and machines. However, widespread deep immersion in cyberspace is unlikely, and direct interfacing is unlikely before the 2040s. Autonomous business will become a significant macro business technology concept in industries like mining, financial services, automotive, aerospace, defense, smart cities, medicine, research, higher education, and entertainment. It will depend on technologies that are already available and rapidly advancing, and will involve machine-controlled operations, augmented governance, and interaction with customers and other businesses through blockchain-enabled mechanisms. The programmable economy, based on distributed and decentralized digital resources, supports the production and consumption of goods and services, enabling innovation, entrepreneurship, and value exchange among humans and machines.

#codingexercise: https://1drv.ms/w/c/d609fb70e39b65c8/EYMCYvb9NRtOtcJwdXRDUi0BVzUEyGL-Rz2NKFaKj6KLgA?e=fBM3eo

Friday, April 11, 2025

#codingexercise

Problem: A transformation sequence from word beginWord to word endWord using a dictionary wordList is a sequence of words beginWord -> s1 -> s2 -> ... -> sk such that:

• Every adjacent pair of words differs by a single letter.

• Every si for 1 <= i <= k is in wordList. Note that beginWord does not need to be in wordList.

• sk == endWord

Given two words, beginWord and endWord, and a dictionary wordList, return all the shortest transformation sequences from beginWord to endWord, or an empty list if no such sequence exists. Each sequence should be returned as a list of the words [beginWord, s1, s2, ..., sk].

Example 1:

Input: beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log","cog"]

Output: [["hit","hot","dot","dog","cog"],["hit","hot","lot","log","cog"]]

Explanation: There are 2 shortest transformation sequences:

"hit" -> "hot" -> "dot" -> "dog" -> "cog"

"hit" -> "hot" -> "lot" -> "log" -> "cog"

Example 2:

Input: beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log"]

Output: []

Explanation: The endWord "cog" is not in wordList, therefore there is no valid transformation sequence.

Constraints:

• 1 <= beginWord.length <= 5

• endWord.length == beginWord.length

• 1 <= wordList.length <= 500

• wordList[i].length == beginWord.length

• beginWord, endWord, and wordList[i] consist of lowercase English letters.

• beginWord != endWord

• All the words in wordList are unique.

• The sum of all shortest transformation sequences does not exceed 105.

class Solution {

public List<List<String>> findLadders(String beginWord, String endWord, List<String> wordList) {

List<List<String>> results = new ArrayList<List<String>>();

var q = new LinkedList<String>();

var s = new HashSet<String>(wordList);

q.add(beginWord);

var result = new ArrayList<String>();

combine(beginWord, endWord, s, results, result);

var minOpt = results.stream().filter(x -> x.get(0).equals(beginWord)).mapToInt(x -> x.size()).min();

if (minOpt.isPresent()) {

var min = minOpt.getAsInt();

results = results.stream().filter(x -> x.size() == min).collect(Collectors.toList());

}

return results;

}

private static void combine(String top, String endWord, HashSet<String> s, List<List<String>> results, List<String> result)

{

if (top.equals(endWord)) {

return;

}

result.add(top);

char[] chars = top.toCharArray();

for (int i = 0; i < chars.length; i++)

{

for (char c = 'a'; c <= 'z'; c++)

{

char temp = chars[i];

if (temp != c) {

chars[i] = c;

}

String candidate = new String(chars);

if (s.contains(candidate) && !result.contains(candidate)) {

var clone = new ArrayList<String>(result);

if (candidate.equals(endWord)) {

clone.add(candidate);

results.add(clone);

} else {

combine(candidate, endWord, s, results, clone);

}

chars[i] = temp;

}

result.remove(top);

}

Test cases:

Input

beginWord =

"hit"

endWord =

"cog"

wordList =

["hot","dot","dog","lot","log","cog"]

Output

[["hit","hot","dot","dog","cog"],["hit","hot","lot","log","cog"]]

Expected

[["hit","hot","dot","dog","cog"],["hit","hot","lot","log","cog"]]

Input

beginWord =

"hit"

endWord =

"cog"

wordList =

["hot","dot","dog","lot","log"]

Output

[]

Expected

[]

Thursday, April 10, 2025

The following script can be used to covert the manuscript of a book into its corresponding audio production.

Option 1: individual chapters

import azure.cognitiveservices.speech as speechsdk

import time

def batch_text_to_speech(text, output_filename):

# Azure Speech Service configuration

speech_key = "<use-your-speech-key>"

service_region = "eastus"

# Configure speech synthesis

speech_config = speechsdk.SpeechConfig(

subscription=speech_key,

region=service_region

)

# Set output format to MP3

speech_config.set_speech_synthesis_output_format(speechsdk.SpeechSynthesisOutputFormat.Audio48Khz192KBitRateMonoMp3)

speech_config.speech_synthesis_voice_name = "en-US-BrianMultilingualNeural"

# Create audio config for file output

audio_config = speechsdk.audio.AudioOutputConfig(filename=output_filename)

# Create speech synthesizer

synthesizer = speechsdk.SpeechSynthesizer(

speech_config=speech_config,

audio_config=audio_config

)

# Split text into chunks if needed (optional)

# text_chunks = split_large_text(text)

# Synthesize text

result = synthesizer.speak_text_async(text).get()

if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:

print(f"Audio synthesized to {output_filename}")

elif result.reason == speechsdk.ResultReason.Canceled:

cancellation_details = result.cancellation_details

print(f"Speech synthesis canceled: {cancellation_details.reason}")

if cancellation_details.reason == speechsdk.CancellationReason.Error:

print(f"Error details: {cancellation_details.error_details}")

def split_large_text(text, max_length=9000):

return [text[i:i+max_length] for i in range(0, len(text), max_length)]

input_filename = ""

large_text = ""

for i in range(1,100):

input_filename=f"{i}.txt"

print(input_filename)

if input_filename:

with open(input_filename, "r") as fin:

large_text = fin.read()

print(str(len(large_text)) + " " + input_filename.replace("txt","mp3"))

batch_text_to_speech(large_text, input_filename.replace("txt","mp3"))

Option 2. Whole manuscript:

import requests import json import time from docx import Document import os import uuid

# Azure AI Language Service configuration

endpoint = "https://eastus.api.cognitive.microsoft.com/texttospeech/batchsyntheses/JOBID?api-version=2024-04-01" api_key = "<your_api_key>"

headers = {

"Content-Type": "application/json",

"Ocp-Apim-Subscription-Key": api_key

}

def synthesize_text(inputs):

body = {

"inputKind": "PlainText", # or SSML

'synthesisConfig': {

"voice": "en-US-BrianMultilingualNeural",

# Replace with your custom voice name and deployment ID if you want to use custom voice.

# Multiple voices are supported, the mixture of custom voices and platform voices is allowed.

# Invalid voice name or deployment ID will be rejected.

'customVoices': {

# "YOUR_CUSTOM_VOICE_NAME": "YOUR_CUSTOM_VOICE_ID" }, "inputs": inputs,

"properties": {

"outputFormat": "audio-48khz-192kbitrate-mono-mp3"

}

response = requests.put(endpoint.replace("JOBID", str(uuid.uuid4())), headers=headers, json=body)

if response.status_code < 400:

jobId = f'{response.json()["id"]}'

return jobId

else:

raise Exception(f"Failed to start batch synthesis job: {response.text}")

def get_synthesis(job_id: str):

while True:

url = f'https://eastus.api.cognitive.microsoft.com/texttospeech/batchsyntheses/{job_id}?api-version=2024-04-01'

headers = { "Content-Type": "application/json", "Ocp-Apim-Subscription-Key": api_key }

response = requests.get(url, headers=headers)

if response.status_code < 400:

status = response.json()['status']

if "Succeeded" in status:

return response.json()

else:

print(f'batch synthesis job is still running, status [{status}]')

time.sleep(5) # Wait for 5 seconds before checking again

def get_text(file_path):

with open(file_path, 'r') as file:

file_contents = file.read()

print(f"Length of text: {len(file_contents)}")

return file_contents

if name == "main":

input_file_name = ""

large_text = ""

inputs = []

for i in range(1,100):

input_file_name=f"{i}.txt"

print(input_file_name)

if input_file_name:

document_text = get_text(input_file_name)

inputs += [ { "content": document_text }, ]

jobId = synthesize_text(inputs)

print(jobId)

# Get audio result

audio = get_synthesis(jobId)

print("Result:")

print(audio)

#Codingexercise: https://1drv.ms/w/c/d609fb70e39b65c8/EV8iyT_-kuVCp1f6IVela_0BRuHHSQwBqNnng7Ztz4cQaA?e=ZHpPON

Wednesday, April 9, 2025

This is a summary of the book titled “Crash Landing” written by Liz Hoffman and published by Crown in 2023. When the pandemic hit, the 2008 recession was dwarfed. Leaders had to act fast. Billions of dollars changed hands. Some companies made money while others barely endured. Hoffman provided intimate portraits of leaders who navigated these times as the inside story of how some companies survived an economy on the brink. Many were blindsided by the pandemic such as the America’s airline industry. By the time they could grapple with the reality, the crisis was a major one. When money stopped flowing, companies borrowed. The US Government threw in its vast financial firepower at the crisis but the economy that survived was no longer the original before the pandemic.

In late January 2020, the financial elite at the World Economic Forum in Davos, Switzerland, were unaware of the potential impact of COVID-19 on the global economy. The US economy had experienced 10 consecutive years of growth and had a record high in 2019 corporate profits. However, the American economy was particularly vulnerable to a health crisis due to stagnant wages, reduced workers' benefits, and lack of surplus funds. The virus was already affecting Taiwan and Japan, and it would soon appear in Europe and North America. The airline industry, which had enjoyed a champagne decade, was also vulnerable to the virus. In 2020, the globalized world was not interconnected by land or sea, but by air. Direct flights reached twice as many cities as they had 20 years earlier. A 35-year-old man returning from China in mid-January became America's first reported case of COVID-19, unaware of the potentially deadly virus.

In March 2020, the world faced a major crisis due to the COVID-19 pandemic. American executives and Wall Street bankers were not taking the situation seriously, as businesses in other countries did. The virus spreads through the air and resembles an ordinary flu, leading to widespread social distancing. Despite rigorous lockdowns, COVID-19 quickly crossed out of China, leading to the closure of Disney's Shanghai Park, McDonald's, Starbucks, Delta, and Hilton's hotels in China. The world's greatest economy shut down, and Wall Street's financial markets experienced panic. Bill Ackman, founder and CEO of Pershing Square Capital Management, believed the virus might be difficult to control in the US, leading to massive unemployment and civil unrest. Investors began feeling spooked, and stock values fell. The Federal Reserve intervened with an interest rate cut, but the markets remained open. On March 11, 2020, the World Health Organization announced that COVID-19 had reached the level of a global pandemic, leading to stock declines, sports leagues ending, and Disney parks closing.

The COVID-19 pandemic severely impacted the travel industry, leading to a significant drop in revenue per available room, a crucial financial metric in the hotel industry. Hilton, a major hotel chain, had barely survived the 2008 financial collapse and was unable to survive the 2020 crisis. The pandemic exposed the dangers of a financial playbook that had become the default in corporate boardrooms over the previous two decades. Hilton's leaders called in its $1.75 billion line of credit to borrow money and worry that banks themselves could go under. The 2020 financial meltdown differed from the 2008 crisis, as it was not as severe as the 2008 crisis. Wall Street traders were uneasy and the sudden need to work from home worsened volatility. Bank reforms in 2020 limited Wall Street's activities and freedoms, leading to a decline in productivity and a dramatic fall in the S&P 500.

The US government and airline executives faced financial challenges during the COVID economic crisis, aiming to avoid bankruptcy and a complete meltdown. After negotiations, Congress pursued a multibillion-dollar payroll relief package, leading to major hedge funds selling bonds and Airbnb spending billions on COVID refunds. The number of Americans with COVID increased exponentially, and banks borrowed billions. The economy that survived the pandemic is not the one that crashed headlong into it, but it did not fall into depression. The pandemic created value, such as improved telecommunications infrastructure and higher pay for essential workers. However, inflation and interest rates rose, making life difficult for ordinary people. While the pandemic wasn't a total disaster, it required a careful balance between swift action and making the right choices.

#codingexercise: https://1drv.ms/w/c/d609fb70e39b65c8/Echlm-Nw-wkggNYlIwEAAAABD8nSsN--hM7kfA-W_mzuWw?e=BQczmz

Tuesday, April 8, 2025

Lessons from storage engineering for Knowledge bases and RAGs.

Data at rest and in transit are chunks of binaries that make sense only when there are additional layers built to process them, store them, ETL them or provide them as results to queries and storage engineering has a rich tradition in building datastores, as databases and data warehouses and even making them virtual and hosted in the cloud. Vector databases, albeit the authority in embeddings and semantic similarity, do not operate independently but must be part of a system that serves as a data platform and often spans multiple and hybrid data sources for best results. The old and the new worlds can enter a virtuous feedback loop that can improve the use of new datastores.

Take Facebook Presto, for example, as a success story in bridging structured and unstructured social engineering data. Developed as an open-source distributed SQL query engine, it revolutionized data analytics by enabling seamless querying across structured and unstructured data sources. Presto's ability to perform federated queries allowed users to join and analyze data from diverse sources, such as Hadoop Distributed File System (HDFS), Apache Cassandra, and relational databases, in real-time. This unified approach eliminated the need for multiple specialized tools, bridging the gap between structured and unstructured data. Presto's architecture, optimized for low query latency, employed in-memory processing and pipelined execution, significantly reducing end-to-end latency compared to traditional systems like Hive3. Its scalability and flexibility made it a valuable tool for handling petabyte-scale datasets.

Drawing parallels to technologies that work with structured and vector data, vector databases emerge as a compelling counterpart. These databases are designed to store and retrieve high-dimensional vectors, which are mathematical representations of objects. By mapping structured data into vector space, vector databases facilitate similarity searches and enable AI algorithms to retrieve relevant information efficiently. For example, Milvus, a popular vector database, supports vectorizing structured data and querying it for advanced analytics. This process involves converting structured data into numerical vectors using machine learning models, allowing for nuanced analysis and pattern detection.

Both Presto and vector databases share a common goal: unifying disparate data types for seamless analysis. Some examples of vector databases include:

• Milvus: Milvus is an open-source vector database designed for managing large-scale vector data. It supports hybrid searches, combining structured metadata with vector similarity queries, making it ideal for applications like recommendation systems and AI-driven analytics.

• Weaviate: Weaviate is another open-source vector database that integrates structured data with vector embeddings. It offers semantic search capabilities and allows users to query data using natural language prompts.

• Redis (Redis-Search and Redis-VSS): Redis has extensions for vector search that enable hybrid queries, combining structured data with vector-based similarity searches. It's optimized for high-speed lookups and real-time applications.

• Qdrant: Qdrant is a vector database that supports hybrid queries, allowing structured filters alongside vector searches. It is designed for scalable and efficient AI applications.

Azure Cosmos DB stands out as a versatile database service that integrates vector search capabilities alongside its traditional NoSQL and relational database functionalities. When compared with the above list, here’s how it stands out:

• Hybrid Data Support: Like Milvus, Weaviate, Redis, and Qdrant, Azure Cosmos DB supports hybrid queries, combining structured data with vector embeddings. This makes it suitable for applications requiring both traditional database operations and vector-based similarity searches.

• Integrated Vector Store: Azure Cosmos DB allows vectors to be stored directly within documents alongside schema-free data. This colocation simplifies data management and enhances the efficiency of vector-based operations, a feature that aligns with the capabilities of vector databases.

• Scalability and Performance: Azure Cosmos DB offers automatic scalability and single-digit millisecond response times, ensuring high performance at any scale. This is comparable to the optimized performance of vector databases like Redis and Milvus.

• Vector Indexing: Azure Cosmos DB supports advanced vector indexing methods, such as DiskANN-based quantization, enabling efficient and accurate vector searches. This is like the indexing techniques used in specialized vector databases.

• AI Integration: Azure Cosmos DB is designed to support AI-driven applications, including natural language processing, recommendation systems, and multi-modal searches. This aligns with the use cases of vector databases like Weaviate and Qdrant.

While Azure Cosmos DB provides robust vector search capabilities, it also offers the flexibility of a general-purpose database, making it a compelling choice for organizations looking to unify structured, unstructured, and vector data within a single platform.

#codingexercise: https://1drv.ms/w/c/d609fb70e39b65c8/Echlm-Nw-wkggNYlIwEAAAABD8nSsN--hM7kfA-W_mzuWw?e=BQczmz

Monday, April 7, 2025

The following script can be used to covert the manuscript of a book into its corresponding audio production.

Option 1: individual chapters

import azure.cognitiveservices.speech as speechsdk

import time

def batch_text_to_speech(text, output_filename):

# Azure Speech Service configuration

speech_key = "<use-your-speech-key>"

service_region = "eastus"

# Configure speech synthesis

speech_config = speechsdk.SpeechConfig(

subscription=speech_key,

region=service_region

)

# Set output format to MP3

speech_config.set_speech_synthesis_output_format(speechsdk.SpeechSynthesisOutputFormat.Audio48Khz192KBitRateMonoMp3)

speech_config.speech_synthesis_voice_name = "en-US-BrianMultilingualNeural"

# Create audio config for file output

audio_config = speechsdk.audio.AudioOutputConfig(filename=output_filename)

# Create speech synthesizer

synthesizer = speechsdk.SpeechSynthesizer(

speech_config=speech_config,

audio_config=audio_config

)

# Split text into chunks if needed (optional)

# text_chunks = split_large_text(text)

# Synthesize text

result = synthesizer.speak_text_async(text).get()

if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:

print(f"Audio synthesized to {output_filename}")

elif result.reason == speechsdk.ResultReason.Canceled:

cancellation_details = result.cancellation_details

print(f"Speech synthesis canceled: {cancellation_details.reason}")

if cancellation_details.reason == speechsdk.CancellationReason.Error:

print(f"Error details: {cancellation_details.error_details}")

def split_large_text(text, max_length=9000):

return [text[i:i+max_length] for i in range(0, len(text), max_length)]

input_filename = ""

large_text = ""

for i in range(1,100):

input_filename=f"{i}.txt"

print(input_filename)

if input_filename:

with open(input_filename, "r") as fin:

large_text = fin.read()

print(str(len(large_text)) + " " + input_filename.replace("txt","mp3"))

batch_text_to_speech(large_text, input_filename.replace("txt","mp3"))

Option 2. Whole manuscript:

import requests import json import time from docx import Document import os import uuid

# Azure AI Language Service configuration

endpoint = "https://eastus.api.cognitive.microsoft.com/texttospeech/batchsyntheses/JOBID?api-version=2024-04-01" api_key = "<your_api_key>"

headers = {

"Content-Type": "application/json",

"Ocp-Apim-Subscription-Key": api_key

}

def synthesize_text(inputs):

body = {

"inputKind": "PlainText", # or SSML

'synthesisConfig': {

"voice": "en-US-BrianMultilingualNeural",

# Replace with your custom voice name and deployment ID if you want to use custom voice.

# Multiple voices are supported, the mixture of custom voices and platform voices is allowed.

# Invalid voice name or deployment ID will be rejected.

'customVoices': {

# "YOUR_CUSTOM_VOICE_NAME": "YOUR_CUSTOM_VOICE_ID" }, "inputs": inputs,

"properties": {

"outputFormat": "audio-48khz-192kbitrate-mono-mp3"

}

response = requests.put(endpoint.replace("JOBID", str(uuid.uuid4())), headers=headers, json=body)

if response.status_code < 400:

jobId = f'{response.json()["id"]}'

return jobId

else:

raise Exception(f"Failed to start batch synthesis job: {response.text}")

def get_synthesis(job_id: str):

while True:

url = f'https://eastus.api.cognitive.microsoft.com/texttospeech/batchsyntheses/{job_id}?api-version=2024-04-01'

headers = { "Content-Type": "application/json", "Ocp-Apim-Subscription-Key": api_key }

response = requests.get(url, headers=headers)

if response.status_code < 400:

status = response.json()['status']

if "Succeeded" in status:

return response.json()

else:

print(f'batch synthesis job is still running, status [{status}]')

time.sleep(5) # Wait for 5 seconds before checking again

def get_text(file_path):

with open(file_path, 'r') as file:

file_contents = file.read()

print(f"Length of text: {len(file_contents)}")

return file_contents

if name == "main":

input_file_name = ""

large_text = ""

inputs = []

for i in range(1,100):

input_file_name=f"{i}.txt"

print(input_file_name)

if input_file_name:

document_text = get_text(input_file_name)

inputs += [ { "content": document_text }, ]

jobId = synthesize_text(inputs)

print(jobId)

# Get audio result

audio = get_synthesis(jobId)

print("Result:")

print(audio)

Sunday, April 6, 2025

Problem #1:

A stream of arbitrary integers appears in no particular order and without duplicates

the rank of each integer is determined by the number of smaller integers before and after it up to the current position

Write a method to get the rank of the current integer in an efficient way.

eg: 7, 1, 3, 9, 5, 8

Solution:

A max heap is used to keep track of the elements we have seen and to count those that are smaller

using System;

using System.Collections.Generic;

using System.Linq;

public class Test

{

private static SortedList<int, int> itemRankPair = new SortedList<int, int>();

public static void Main()

{

var items = new List<int>(){7, 1, 3, 9, 5, 8};

for (int i = 0; i < items.Count; i++)

{

var item = items[i];

Console.WriteLine("Item={0}", item);

if (itemRankPair.ContainsKey(item) == false)

{

itemRankPair.Add(item, GetRank(item));

}

Console.WriteLine();

for (int j = 0; j < i; j++)

{

int k = items[j];

if (k >= item)

{

itemRankPair[k] += 1;

Console.WriteLine("item={0}, Rank={1}", k, itemRankPair[k]);

}

Console.WriteLine();

}

foreach (var k in itemRankPair.Keys.ToList()){

Console.WriteLine("item={0}, Rank={1}", k, itemRankPair[k]);

}

private static int GetRank(int n)

{

int rank = 0;

foreach( var key in itemRankPair.Keys.ToList())

{

if (key < n)

{

rank++;

}

return rank;

}

# Azure infrastructure-as-code solution: IaCResolutionsPart274.docx

Saturday, April 5, 2025

These are the steps to create an AI agent. As discussed earlier, they came in various types and forms, but they can be quite capable. From design, implementation, test to deployment, it must be specific to its requirements, make the right selections of model and knowledge base, and remain pertinent and accurate, while avoiding the pitfalls such as bias, hallucinations, and concerns against safety, security and privacy.

The first step is to draw the requirements, which involves 1. identifying the problem, 2. prompt engineering, and 3. determining user interaction. For example, scheduling meetings, answering common questions, and generating creative content requires have different approaches. Clear instructions for guiding the agents’ behavior, outlining what the agent should do in different scenarios and providing specific instructions while allowing flexibility to handle parameters will help with prompts. The way the user interacts with the agent such as a chat is also important.

The second step is to choose the right model. This is a critical step in building an effective AI agent. The models have their advantages such as GPT-4 from OpenAI for advanced NLP, LLaMA 3 by Meta for its efficiency and adaptability, and Google’s PaLM 2 for handling multilingual tasks. Model’s like Meta’s LLaMA are open and offer customization options while OpenAI’s closed/hosted model GPT-4 come with support, maintenance and ease of use. Also, a less complex model such as GPT-3 might suffice for the requirements.

The performance metrics of different models such as accuracy, response time, scalability, and the ability to handle concurrent requests are important to align with the requirements. Consider customization options when you must fine-tune them for your specific tasks, especially when there is a domain-specific language.

Check if the model can easily be integrated with existing systems and tools especially for API support, environments and external databases, web services or interfaces. There might be some cost implications of various models, and some effort required for experimentation and iteration.

The third step involves enabling tools, for say information retrieval, web browsing and function calling, along with the configuration of specific settings for each tool, the testing of its integration, and adherence to security best practices and observability.

The fourth step is the extension of capabilities via custom functions, such as for summarization or reports, even if it involves writing code, testing the implementation and integrating via endpoints or webhooks, defining parameters and configuring invocations, testing and optimization

As with all software, some best practices are upheld. AI agents are only as good as their data which must be comprehensive and free of bias. They could provide transparency in terms of references or statistics or enumeration of thought,action and observations, enhance security with robust access control with defense-in-depth strategy, avoiding complexity that involves common sense, reasoning, or understanding context and not require a whole lot of or absence of human supervision.

#codingexercise:

https://1drv.ms/w/c/d609fb70e39b65c8/Echlm-Nw-wkggNYlIwEAAAABD8nSsN--hM7kfA-W_mzuWw?e=f29Tjt