Cluster computing: 2026

Saturday, July 25, 2026

The Intelligence Explosion by James Barrat is a book that explores the rapid development of artificial intelligence (AI) and the possible consequences of creating machines that are smarter than humans. Barrat, a journalist and documentary filmmaker, examines both the exciting opportunities and the serious risks that advanced AI could bring to society.

He observes that AI technology is improving at an accelerating rate and discusses the possibility of an “intelligence explosion,” a situation in which an artificial intelligence becomes capable of improving itself. Once this happens, each improvement could help the machine make itself even smarter, leading to extremely rapid growth in intelligence. According to the author, such a system could eventually surpass human intelligence by a wide margin.

Throughout the book, Barrat interviews scientists, researchers, and technology experts who are working in the field of AI. Some of these experts are optimistic about the benefits of advanced AI, including medical breakthroughs, scientific discoveries, and solutions to major global problems. However, many also express concern about the risks. Barrat argues that if powerful AI systems are not properly designed, they may pursue goals that conflict with human values and interests.

He calls out the challenge of controlling highly intelligent machines. Barrat explains that a superintelligent AI might not be evil, but it could still cause harm if its objectives are not perfectly aligned with human needs. He emphasizes that humanity may not get a second chance if such a technology is created without adequate safeguards.

Another important message of the book is the need for careful planning and cooperation. Barrat encourages governments, researchers, and technology companies to think seriously about AI safety before creating increasingly powerful systems. He believes that society should prepare for the future rather than waiting until advanced AI already exists.

Finally, this book is a thought-provoking exploration of the future of artificial intelligence. James Barrat presents both the enormous promise and the potential dangers of machines that could exceed human intelligence. The book encourages readers to think critically about technological progress and the responsibilities that come with creating powerful new inventions. It serves as a warning that while AI could greatly benefit humanity, it must be developed with caution and foresight.

Thursday, July 23, 2026

Capstone exercise

#Capstone Exercise:

A capstone project is a comprehensive, culminating academic assignment that learners complete at the end of a course such as GenAI training. It requires you to apply the skills and knowledge you've acquired throughout your studies to investigate, design a solution for, or evaluate a specific, real-world problem or research question

This Capstone project demonstrates:

✅ Chroma vector store

✅ text-embedding-3-small embeddings

✅ GPT generation

✅ Semantic retrieval

✅ Semantic + threshold retrieval

✅ Hybrid retrieval (BM25 + semantic)

✅ Hallucination-resistant prompt

✅ No-answer fallback

✅ Top-K ≤ 3

✅ Outputs submission.csv

#!/usr/bin/python

import os

import json

import numpy as np

import pandas as pd

from dotenv import load_dotenv

from tenacity import (

retry,

stop_after_attempt,

wait_random_exponential

)

from langchain_community.document_loaders import CSVLoader

from langchain_community.vectorstores import Chroma

from langchain_openai import (

AzureOpenAIEmbeddings,

AzureChatOpenAI

)

from rank_bm25 import BM25Okapi

# ============================================================

# CONFIGURATION

# ============================================================

load_dotenv("./Data/vars.env")

DATASET_FILE = "./Data/capstone1_rag_dataset.csv"

TEST_FILE = "./Data/capstone1_rag_test_questions.csv"

VECTOR_DB_DIR = "./chroma_capstone_db"

AZURE_OPENAI_ENDPOINT = os.environ["MODEL_ENDPOINT"]

OPENAI_API_VERSION = os.environ["API_VERSION"]

CHAT_DEPLOYMENT_NAME = os.environ["MODEL_NAME"]

PROJECT_ID = os.environ["PROJECT_ID"]

EMBEDDINGS_DEPLOYMENT_NAME = os.environ["EMBEDDINGS_DEPLOYMENT_NAME "]

# Required by Chroma in many enterprise environments

os.environ["ANONYMIZED_TELEMETRY"] = "False"

# ============================================================

# AUTHENTICATION

# ============================================================

def get_access_token():

auth = "https://<your-provider-endpoint>/oauth2/token"

scope = "https:// <your-provider-endpoint>/.default"

grant_type = "client_credentials"

with httpx.Client() as client:

body = {

"grant_type": grant_type,

"scope": scope,

"client_id": dbutils.secrets.get(scope="AIML_Training", key="client_id"),

"client_secret": dbutils.secrets.get(scope="AIML_Training", key="client_secret"),

}

headers = {"Content-Type": "application/x-www-form-urlencoded"}

resp = client.post(auth, headers=headers, data=body, timeout=60)

access_token = resp.json()["access_token"]

return access_token

# ============================================================

# MODELS

# ============================================================

embeddings = AzureOpenAIEmbeddings(

azure_deployment=EMBEDDINGS_DEPLOYMENT_NAME,

azure_endpoint=AZURE_OPENAI_ENDPOINT,

api_version=OPENAI_API_VERSION,

azure_ad_token_provider=get_access_token,

default_headers={

"projectId": PROJECT_ID,

"model-usage-type": "prod"

}

)

llm = AzureChatOpenAI(

azure_deployment=CHAT_DEPLOYMENT_NAME,

azure_endpoint=AZURE_OPENAI_ENDPOINT,

api_version=OPENAI_API_VERSION,

azure_ad_token_provider=get_access_token,

default_headers={

"projectId": PROJECT_ID,

"model-usage-type": "prod"

temperature=0.1

)

# ============================================================

# DATA LOADING

# ============================================================

def load_dataset():

loader = CSVLoader(

file_path=DATASET_FILE,

encoding="utf-8"

)

return loader.load()

# ============================================================

# VECTOR STORE

# ============================================================

@retry(

wait=wait_random_exponential(min=2, max=30),

stop=stop_after_attempt(5),

reraise=True

)

def build_vector_store(documents):

return Chroma.from_documents(

documents=documents,

embedding=embeddings,

persist_directory=VECTOR_DB_DIR

)

# ============================================================

# BM25 INDEX

# ============================================================

def build_bm25_index(documents):

corpus = [

doc.page_content

for doc in documents

]

tokenized = [

text.lower().split()

for text in corpus

]

bm25 = BM25Okapi(tokenized)

return bm25, corpus

# ============================================================

# RETRIEVAL STRATEGY #1

# Semantic Search

# ============================================================

def semantic_retrieval(

query,

vectorstore,

top_k=3

results = vectorstore.similarity_search(

query,

k=top_k

)

docs = [

doc.page_content

for doc in results

]

return docs

# ============================================================

# RETRIEVAL STRATEGY #2

# Semantic + Threshold Filtering

# ============================================================

def threshold_retrieval(

query,

vectorstore,

threshold=0.70,

top_k=3

try:

results = vectorstore.similarity_search_with_relevance_scores(

query,

k=10

)

filtered_docs = []

for doc, score in results:

if score >= threshold:

filtered_docs.append(

doc.page_content

)

return filtered_docs[:top_k]

except Exception:

return semantic_retrieval(

query,

vectorstore,

top_k

)

# ============================================================

# RETRIEVAL STRATEGY #3

# Hybrid BM25 + Semantic

# ============================================================

def hybrid_retrieval(

query,

vectorstore,

bm25,

corpus,

top_k=3

semantic_results = vectorstore.similarity_search(

query,

k=10

)

semantic_texts = {

doc.page_content

for doc in semantic_results

}

bm25_scores = bm25.get_scores(

query.lower().split()

)

ranked_idx = np.argsort(

bm25_scores

)[::-1][:10]

bm25_texts = {

corpus[idx]

for idx in ranked_idx

}

combined_docs = list(

semantic_texts.union(

bm25_texts

)

scored_docs = []

query_embedding = embeddings.embed_query(

query

)

for doc_text in combined_docs:

try:

doc_embedding = embeddings.embed_query(

doc_text[:8000]

)

cosine = np.dot(

query_embedding,

doc_embedding

) / (

np.linalg.norm(query_embedding)

* np.linalg.norm(doc_embedding)

)

scored_docs.append(

(

doc_text,

float(cosine)

)

except Exception:

pass

scored_docs.sort(

key=lambda x: x[1],

reverse=True

)

return [

doc

for doc, _

in scored_docs[:top_k]

]

# ============================================================

# GENERATION

# ============================================================

@retry(

wait=wait_random_exponential(min=2, max=30),

stop=stop_after_attempt(5),

reraise=True

)

def generate_answer(

query,

retrieved_docs

if len(retrieved_docs) == 0:

return (

"The question cannot be answered "

"using the available documents."

)

context = "\n\n".join(

retrieved_docs

)

prompt = f"""

You are a clinical intelligence assistant.

IMPORTANT RULES:

1. Use ONLY the provided context.

2. Do NOT use prior medical knowledge.

3. Do NOT hallucinate.

4. If the answer is not present in the context,

say:

"The question cannot be answered using the available documents."

5. Cite information only from context.

6. Keep responses concise and factual.

CONTEXT:

{context}

QUESTION:

{query}

ANSWER:

"""

response = llm.invoke(

prompt

)

return response.content

# ============================================================

# MAIN RAG PIPELINE

# ============================================================

def rag_pipeline(

query,

vectorstore,

bm25,

corpus,

retrieval_strategy="hybrid"

if retrieval_strategy == "semantic":

docs = semantic_retrieval(

query,

vectorstore

)

elif retrieval_strategy == "threshold":

docs = threshold_retrieval(

query,

vectorstore

)

else:

docs = hybrid_retrieval(

query,

vectorstore,

bm25,

corpus

)

answer = generate_answer(

query,

docs

)

return {

"retrieved_documents": docs,

"generated_answer": answer

}

# ============================================================

# MAIN

# ============================================================

if __name__ == "__main__":

print("Loading dataset...")

documents = load_dataset()

print(

f"Documents Loaded: {len(documents)}"

)

print("Building vector store...")

vectorstore = build_vector_store(

documents

)

print("Building BM25 index...")

bm25, corpus = build_bm25_index(

documents

)

print("Loading questions...")

questions_df = pd.read_csv(

TEST_FILE,

dtype=str

).fillna("")

questions_df[

"retrieved_documents"

] = ""

questions_df[

"generated_answer"

] = ""

for idx, row in questions_df.iterrows():

question = row["question"]

print("\n" + "=" * 80)

print(

f"QUESTION {idx + 1}:"

)

print(question)

result = rag_pipeline(

query=question,

vectorstore=vectorstore,

bm25=bm25,

corpus=corpus,

retrieval_strategy="hybrid"

)

print("\nANSWER:")

print(

result["generated_answer"]

)

questions_df.loc[

idx,

"retrieved_documents"

] = json.dumps(

result[

"retrieved_documents"

]

)

questions_df.loc[

idx,

"generated_answer"

] = result[

"generated_answer"

]

submission = questions_df[

[

"question",

"retrieved_documents",

"generated_answer"

]

submission.to_csv(

"submission.csv",

index=False

)

print("\nsubmission.csv created.")

print(

f"Rows: {len(submission)}"

)

# ============================================================

# SAMPLE OUTPUT

# ============================================================

Loading dataset...

Creating vector store...

Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given

Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given

Loading questions...

Processing: What are the key features of …

Failed to send telemetry event CollectionQueryEvent: capture() takes 1 positional argument but 3 were given

document_id: 94

document_url: https://...

context: Auto…

---

document_id: 769

document_url: https://...

context: Palm…

---

document_id: 784

document_url: https://...

context: La...

questions_df.loc[idx, "retrieved_documents"]

questions_df.loc[idx, "generated_answer"]

Wednesday, July 22, 2026

Introduction

In his satirical short story collection, Abschalten: Die Business Class macht Ferien (translated as Switching Off: The Business Class Takes a Vacation), Swiss author Martin Suter explores the psychological shortcomings of modern corporate culture. The book focuses on middle management corporate workers. These characters are constantly drained by corporate strategies, workplace rivalries, and endless meetings. Suter uses humor to show what happens when workaholics are forced to take a vacation. Instead of relaxing, they find themselves unable to escape their corporate mindsets.

Plot Summary and Core Themes

The book consists of over 50 short stories. Each story details the vacation struggles of different corporate managers. The plot moves from the stressful office environment to luxury holiday destinations. Instead of resting, the characters treat their time off like business projects. They attempt to micromanage their families, schedule their relaxation down to the minute, and optimize their leisure time.

Suter highlights a major thematic conflict: the absolute fear of being useless. For these corporate leaders, a successful vacation is terrifying because it means the company can run without them. To cope, some managers delay their trips, leave early, or find ways to stay constantly tethered to smartphones and laptops. The stories highlight how deep work obsessions can run, showing that corporate habits often destroy personal lives and family time.

Character Analysis

The characters in Suter's stories, carrying common corporate names like Huber, Lindner, and Glaser, serve as representations of the modern white-collar worker. They lack distinct individual identities. Instead, they are defined entirely by their job titles and status symbols. For instance, the character Glaser attempts to use specialized therapies and medical treatments just to find a mental "off-switch," only to realize his identity is completely dependent on work stress. The characters are blind to their own absurdity, making them both targets for satire and tragic examples of modern burnout.

Style and Satirical Elements

Suter utilizes a detached, sharp narrative voice that mirrors the language of corporate memos and performance reviews. This stylistic choice emphasizes the comedy, as intimate family interactions are described using corporate buzzwords like "synergy," "quality time," and "efficiency". The repetitive structure of the stories reinforces the idea that these managers are stuck in an endless loop of work stress.

Conclusion

Ultimately, Abschalten serves as a warning about work culture. Suter suggests that the modern business world strips people of their ability to experience simple human joy. By turning vacations into business operations, the characters reveal a deeper societal issue: the loss of personal identity outside of employment. The book leaves readers with a clear takeaway: if you cannot turn off your work brain, you will eventually lose your freedom.

Monday, July 20, 2026

Fourier Transformations for wave propagation:

Introduction: A Fast Fourier Transform converts wave form data in the time domain into the frequency domain. It achieves this by breaking down the original time-based waveform into a series of sinusoidal terms, each with a unique magnitude, frequency and phase. This process converts a waveform in the time domain into a series of sinusoidal functions which when added together reconstruct the original waveform. Plotting the amplitude of each sinusoidal term versus its frequency creates a power spectrum, which is the response of the original waveform in the frequency domain.

When Fourier transforms are applicable, it means the “earth response” now is the same as the “earth response” later. Switching our point of view from time to space, the applicability of the Fourier transformation means that the “impulse response” here is the same as the “impulse response” there. An impulse is a column vector full of zeros with somewhere a one. An impulse response is a column from the matrix q = Bp The collection of impulse responses in q=Bp defines the convolution operation.

Sample FFT application:

import numpy as nm

import scipy

import scipy.fftpack

import pylab

def lowpass_cosine( y, tau, f_3db, width, padd_data=True):

# padd_data = True means we are going to symmetric copies of the data to the start and stop

# to reduce/eliminate the discontinuities at the start and stop of a dataset due to filtering

# False means we're going to have transients at the start and stop of the data

# kill the last data point if y has an odd length

if nm.mod(len(y),2):

y = y[0:-1]

# add the weird padd

# so, make a backwards copy of the data, then the data, then another backwards copy of the data

if padd_data:

y = nm.append( nm.append(nm.flipud(y),y) , nm.flipud(y) )

# take the FFT

ffty=scipy.fftpack.fft(y)

ffty=scipy.fftpack.fftshift(ffty)

# make the companion frequency array

delta = 1.0/(len(y)*tau)

nyquist = 1.0/(2.0*tau)

freq = nm.arange(-nyquist,nyquist,delta)

# turn this into a positive frequency array

pos_freq = freq[(len(ffty)/2):]

# make the transfer function for the first half of the data

i_f_3db = min( nm.where(pos_freq >= f_3db)[0] )

f_min = f_3db - (width/2.0)

i_f_min = min( nm.where(pos_freq >= f_min)[0] )

f_max = f_3db + (width/2);

i_f_max = min( nm.where(pos_freq >= f_max)[0] )

transfer_function = nm.zeros(len(y)/2)

transfer_function[0:i_f_min] = 1

transfer_function[i_f_min:i_f_max] = (1 + nm.sin(-nm.pi * ((freq[i_f_min:i_f_max] - freq[i_f_3db])/width)))/2.0

transfer_function[i_f_max:(len(freq)/2)] = 0

# symmetrize this to be [0 0 0 ... .8 .9 1 1 1 1 1 1 1 1 .9 .8 ... 0 0 0] to match the FFT

transfer_function = nm.append(nm.flipud(transfer_function),transfer_function)

# plot up the transfer function

# since "freq" is only the positive frequencies, select out

pylab.figure(1)

pylab.clf()

pylab.plot(freq,transfer_function)

pylab.xlabel('Frequency [Hz]')

pylab.ylabel('Filter Transfer Function')

pylab.xlim([-10.0,10.0])

pylab.ylim([-0.05,1.05])

# apply the filter, undo the fft shift, and invert the fft

filtered=nm.real(scipy.fftpack.ifft(scipy.fftpack.ifftshift(ffty*transfer_function)))

# remove the padd, if we applied it

if padd_data:

filtered = filtered[(len(y)/3):(2*(len(y)/3))]

# return the filtered data

return filtered

# do an example of lowpass filtering

# first make some fake data

# a sine wave fluctuating once every pi seconds

# samples 1000 times per second

fakedata = nm.sin(nm.arange(0,11,0.001)) + nm.random.randn(len(nm.arange(0,11,0.001)))/4.0

# run the filter

# lowpass at 5 Hz, with a 1 Hz width of its roll-off

filtered = lowpass_cosine(fakedata,0.001,5.0,1.0,padd_data=True)

# plot the noisy data, with the filtered data on top

pylab.figure(2)

pylab.clf()

pylab.plot(nm.arange(0,11,0.001),fakedata,label='Noisy Data')

pylab.plot(nm.arange(0,11,0.001),filtered,label='Lowpass Filtered Data')

pylab.xlabel('Time [s]')

pylab.ylabel('Voltage')

pylab.legend()

pylab.ion()

pylab.show()

References: Drone Video Sensing Application: https://github.com/ravibeta/dvsa-api/

Sunday, July 19, 2026

Generative artificial intelligence is becoming a normal part of modern software, business workflows, and digital operations, but its usefulness depends on disciplined risk management. These systems should not be treated as ordinary web tools or simple productivity aids. They can process prompts, files, code, credentials, customer records, business plans, and other sensitive inputs; they can also return generated text, code, links, recommendations, or automated actions that may be incomplete, unsafe, or difficult to verify. Because these systems often appear through many access paths, including public websites, desktop applications, browser extensions, programming interfaces, embedded application features, marketplace plugins, and integrations with managed or unmanaged devices, organizations need a clear operating model before broad adoption becomes uncontrolled use.

A practical way to understand this environment is to classify generative applications by their approval status and security posture. Some tools are formally approved, managed, and governed by the organization. Others are tolerated for limited business needs even though they are not centrally owned, and they require constraints around users, data types, and permitted tasks. A third group consists of unapproved tools used without oversight, which creates the greatest exposure because the organization may not know what data is being submitted, where it is stored, who can access it, or whether it may be used to improve external models. This classification is not merely administrative. It determines how identity controls, network policy, logging, monitoring, data protection, and user training should be applied.

The first major risk is lack of visibility. When employees adopt generative tools independently, security and engineering teams may lose the ability to observe data flows, inspect usage patterns, or detect risky behavior. This “shadow” usage can result in sensitive source code, internal design notes, customer information, intellectual property, credentials, or regulated data being sent to systems that were never reviewed. Visibility must therefore extend beyond traditional application inventories. It should include browser-based access, installed applications, plugins, embedded features inside existing platforms, application programming interface calls, and connections from both managed and unmanaged endpoints. Without this baseline, an organization cannot make reliable decisions about which tools to allow, restrict, or block.

The second major risk is weak access control. Generative systems can amplify the consequences of excessive permissions because they make it easier to summarize, transform, export, or combine information at scale. If every user in a department can submit sensitive datasets or retrieve generated analysis without role-based limits, the tool may become a path for accidental disclosure or misuse. Access should be granular, based on job function, business need, data sensitivity, device posture, and application category. Approved tools may be broadly available under controlled conditions, tolerated tools may be limited to specific teams and use cases, and unapproved tools should be blocked or isolated when they create unacceptable risk. These controls should be reviewed regularly because both business needs and application behavior change quickly.

The third major risk is unsafe generated content. A model can produce code that appears correct but contains insecure dependencies, flawed authorization checks, injection vulnerabilities, or licensing concerns. It can also generate links, scripts, configuration suggestions, or operational instructions that users may trust too readily. Engineering teams should treat generated output as untrusted until it has been reviewed, tested, and validated through normal secure development practices. This includes static analysis, dependency scanning, code review, threat modeling, test coverage, and careful handling of generated commands or infrastructure changes. Training is important because many failures occur not from malicious intent but from misplaced confidence in fluent output.

Plugins and integrations require special attention because they can expand an application’s effective permissions beyond what users recognize. A plugin may read messages, files, tickets, repositories, calendars, customer records, or other enterprise data, and it may continue to operate through delegated permissions or service accounts. Marketplace availability does not imply that a plugin is safe for a particular organization. Each integration should be evaluated for requested permissions, authentication method, data access scope, logging behavior, retention practices, and administrative ownership. Service accounts and application credentials should follow least-privilege principles, be rotated when appropriate, and be monitored for anomalous behavior. Blocking direct access to a parent application is not sufficient if related plugins or embedded capabilities can still reach sensitive data.

Data at rest is another important concern. Sensitive information may accumulate inside generative applications through prompts, uploaded files, conversation history, cached responses, logs, embeddings, or connected data stores. If retained data is not discovered and governed, it can create compliance, privacy, and intellectual property exposure. Organizations should identify what information is stored, how long it persists, who can access it, whether it can be exported, and whether it is used for model improvement or downstream processing. Data discovery and remediation should include both approved and tolerated systems, because risk is often created by ordinary usage patterns rather than obvious policy violations.

A balanced governance model avoids two common mistakes. One mistake is allowing uncontrolled adoption because the productivity benefits seem immediate. The other is applying overly broad restrictions that block useful work and drive employees toward less visible alternatives. Effective governance enables safe use by combining discovery, classification, access control, data inspection, monitoring, and education. Data loss prevention policies should inspect outbound information based on sensitivity and context, not just keywords. Rules should account for source code, secrets, personal information, regulated records, confidential plans, and proprietary documents. They should also evolve as new applications, data types, and attack patterns emerge.

Continuous monitoring is essential because generative AI adoption changes faster than many traditional software programs. Security teams should maintain an inventory of tools in use, observe traffic and data flows, detect newly introduced plugins or automated agents, review changes in application permissions, and alert on policy violations or unusual activity. Risk assessment should be recurring rather than one time. It should consider who is using a tool, what data is being submitted, what outputs are produced, whether the tool interacts with internal systems, and whether the use case aligns with organizational policy. For engineering organizations, this monitoring should be integrated with existing secure software development, identity governance, endpoint management, and incident response processes.

Employee education should be practical and role specific. Developers need guidance on reviewing generated code, protecting secrets, avoiding sensitive prompts, and validating third-party packages. Product and design teams need guidance on responsible use of customer data and internal strategy documents. Support, sales, finance, human resources, and operations teams need clear examples of what information may or may not be submitted to generative systems. Training should be reinforced through timely prompts, notifications, and approved alternatives so that users are guided toward safe behavior at the moment they are making decisions.

Success should be measured with both productivity and protection in mind. Useful metrics include adoption of approved tools, reduction in unapproved usage, fewer data exposure incidents, improved response to policy violations, employee satisfaction with available tools, time saved in common workflows, and evidence that generated outputs are being reviewed appropriately. These measures help leaders determine whether controls are enabling responsible use rather than merely restricting activity. A mature program treats generative AI as part of the broader software and data ecosystem: powerful, useful, and increasingly embedded, but requiring explicit design for security, privacy, reliability, and accountability. With clear classification, least-privilege access, strong data controls, continuous monitoring, and practical user education, organizations can benefit from generative AI while reducing the likelihood that productivity gains become security liabilities.

#codingexercise: Codingexercise-07-19-2026.docx

Saturday, July 18, 2026

Given two strings s and t, both consisting of lowercase English letters and digits, your task is to calculate how many ways exactly one digit could be removed from one of the strings so that s is lexicographically smaller than t after the removal. Note that we are removing only a single instance of a single digit, rather than all instances (eg: removing 1 from the string a11b1c could result in a1b1c or a11bc, but not abc).

Also note that digits are considered lexicographically smaller than letters.

Example

• For s = "ab12c" and t = "1zz456", the output should be solution(s, t) = 1.

Here are all the possible removals:

o We can remove the first digit from s, obtaining "ab2c". "ab2c" > "1zz456", so we don't count this removal

o We can remove the second digit from s, obtaining "ab1c". "ab1c" > "1zz456", so we don't count this removal

o We can remove the first digit from t, obtaining "zz456". "ab12c" < "zz456", so we count this removal

o We can remove the second digit from t, obtaining "1zz56". "ab12c" > "1zz56", so we don't count this removal

o We can remove the third digit from t, obtaining "1zz46". "ab12c" > "1zz46", so we don't count this removal

o We can remove the fourth digit from t, obtaining "1zz45". "ab12c" > "1zz45", so we don't count this removal

The only valid case where s < t after removing a digit is "ab12c" < "zz456". Therefore, the answer is 1.

• For s = "ab12c" and t = "ab24z", the output should be solution(s, t) = 3.

There are 4 possible ways of removing the digit:

o "ab1c" < "ab24z"

o "ab2c" > "ab24z"

o "ab12c" < "ab4z"

o "ab12c" < "ab2z"

Three of these cases match the requirement that s < t, so the answer is 3.

Input/Output

• [execution time limit] 3 seconds (java)

• [input] string s

A string consisting of lowercase English letters and digits 0..9.

Guaranteed constraints:

1 ≤ s.length ≤ 103.

• [input] string t

A string consisting of lowercase English letters and digits 0..9.

Guaranteed constraints:

1 ≤ t.length ≤ 103.

• [output] integer

The number of ways to remove exactly one digit from one of the strings so that s is lexicographically smaller than t after the removal.

Java solution:

int solution(String s, String t) {

int count = 0;

for (int i = 0; i < s.length(); i++){

if (s.charAt(i) >= '0' && s.charAt(i) <= '9') {

String u = (i-1 >= 0 ? s.substring(0, i) : "") + (i+1 < s.length() ? s.substring(i+1, s.length()) : "");

//// System.out.println(u + " " + t);

if (lessThan(u,t)) {

count++;

}

for (int j = 0; j < t.length(); j++) {

if (t.charAt(j) >= '0' && t.charAt(j) <= '9') {

String u = (j-1 >= 0 ? t.substring(0, j) : "") + (j+1 < t.length() ? t.substring(j+1, t.length()) : "");

///// System.out.println(s + " " + u);

if (lessThan(s,u)) {

count++;

}

return count;

}

void print(String s, String t) {

List<String> r = new ArrayList<String>();

r.add(s);

r.add(t);

Collections.sort(r);

System.out.println(Arrays.toString(r.toArray()));

}

boolean lessThan(String s, String t) {

List<String> r = new ArrayList<String>();

r.add(s);

r.add(t);

Collections.sort(r);

return r.get(0).equals(s);

}

Kotlin solution:

import java.util.Collections;

fun main() {

println(solution("ab12c", "1zz456"));

}

fun solution(s: String, t: String): Int {

var count = 0;

for (i in 0..s.length-1){

if (s[i] >= '0' && s[i] <= '9') {

var u = "";

if (i-1 >= 0) {

u += s.substring(0..i-1);

}

if (i+1 < s.length) {

u += s.substring((i+1)..s.length-1);

}

println(u + " " + t);

if (lessThan(u,t)) {

count++;

}

for (j in 0..t.length-1) {

if (t[j] >= '0' && t[j] <= '9') {

var u = "";

if (j-1 >= 0) {

u += t.substring(0..j-1);

}

if (j+1 < t.length) {

u += t.substring((j+1)..t.length-1);

}

println(s + " " + u);

if (lessThan(s,u)) {

count++;

}

return count;

}

fun lessThan(s: String, t: String) : Boolean {

var r = ArrayList<String>();

r.add(s);

r.add(t);

Collections.sort(r);

return r.get(0).equals(s);

}

Test 1:

Input:

s: "ab12c"

t: "1zz456"

Output:

Expected Output:

Console Output:

ab2c 1zz456

ab1c 1zz456

ab12c zz456

ab12c 1zz56

ab12c 1zz46

ab12c 1zz45

Error Output:

Empty

Test 2:

Input:

s: "ab12c"

t: "ab24z"

Output:

Expected Output:

Console Output:

ab2c ab24z

ab1c ab24z

ab12c ab4z

ab12c ab2z

Error Output:

Empty

Test 3:

Input:

s: "96726"

t: "9z34c"

Output:

Expected Output:

Console Output:

6726 9z34c

9726 9z34c

9626 9z34c

9676 9z34c

9672 9z34c

96726 z34c

96726 9z4c

96726 9z3c

Error Output:

Empty

Test 4:

Input:

s: "4u05q"

t: "ed0r7"

Output:

Expected Output:

Console Output:

u05q ed0r7

4u5q ed0r7

4u0q ed0r7

4u05q edr7

4u05q ed0r

Error Output:

Empty

Test 5:

Input:

s: "6"

t: "h"

Output:

Expected Output:

Console Output:

Error Output:

Empty

Problem 2:

You are given an array of integers a, where each element a[i] represents the length of a ribbon.

Your goal is to obtain k ribbons of the same length, by cutting the ribbons into as many pieces as you want.

Your task is to calculate the maximum integer length L for which it is possible to obtain at least k ribbons of length L by cutting the given ones.

Example

• For a = [5, 2, 7, 4, 9] and k = 5, the output should be solution(a, k) = 4.

Here's a way to achieve 5 ribbons of length 4:

o Cut the ribbon of length 5 into one ribbon of length 1 (which can be discarded) and one ribbon of length 4.

o Cut the ribbon of length 7 into one ribbon of length 3 (which can be discarded) and one ribbon of length 4.

o Use the existing ribbon of length 4 (no need to cut it)

o Cut the ribbon of length 9 into two ribbons of length 4 (and one of length 1 which can be discarded)

o Discard the ribbon of length 2.

And since it wouldn't be possible to make 5 ribbons of any greater length, the answer is 4.

• For a = [1, 2, 3, 4, 9] and k = 6, the output should be solution(a, k) = 2.

Here's one way we could make 6 ribbons of length 2:

o Cut the ribbon of length 9 into four ribbons of length 2 and one ribbon of length 1 (which won't be used).

o Cut the ribbon of length 4 into two ribbons of length 2.

o Ignore all other ribbons (1, 2, and 3). Even though ribbons with lengths 2 and 3 can also be used to obtain the ribbon of length 2, we don't need more than 6 ribbons of that length.

It would technically be possible to make 6 ribbons of a length as great as 2.25, but since only integer values are allowed, the answer is 2.

Input/Output

• [execution time limit] 3 seconds (java)

• [input] array.integer a

An array of the ribbons' lengths.

Guaranteed constraints:

1 ≤ a.length ≤ 105,

1 ≤ a[i] ≤ 109.

• [input] integer k

The number of equal-length ribbons you need to obtain. It is guaranteed that it is possible to obtain this number of ribbons from the values in a.

Guaranteed constraints:

1 ≤ k ≤ min(sum(a[i]), 109).

• [output] integer

The maximum possible length of the obtained k ribbons.

Solution 2:

Java solution:

int solution(int[] a, int k) {

var max = Arrays.stream(a).max().getAsInt();

for (int i = 1; i <= max; i++) {

int count = 0;

for (int j = 0; j < a.length; j++) {

if (a[j] >= i) {

count += a[j] / i;

}

if (count < k) {

return i-1;

}

return max;

}

Kotlin solution:

fun ribbons(a: IntArray, k: Int): Int {

var max = Arrays.stream(a).max().getAsInt();

for (i in 1..max) {

var count = 0;

for (j in 0..a.size-1) {

if (a[j] >= i) {

count += a[j] / i;

}

if (count < k) {

return i-1;

}

return max;

}

Test Case 1:

Input:

a: [5, 2, 7, 4, 9]

k: 5

Output:

Expected Output:

Console Output:

Empty

Error Output:

Empty

Test Case 2:

Input:

a: [1, 2, 3, 4, 9]

k: 6

Output:

Expected Output:

Console Output:

Empty

Error Output:

Empty

Test Case 3:

Input:

a: [1, 2, 3, 4, 9]

k: 5

Output:

Expected Output:

Console Output:

Empty

Error Output:

Empty

Test Case 4:

Input:

a: [8, 4, 2, 6, 1, 2, 1, 7]

k: 14

Output:

Expected Output:

Console Output:

Empty

Error Output:

Empty

Test Case 5:

Input:

a: [4, 8, 4, 5, 3, 7, 1, 2, 6]

k: 5

Output:

Expected Output:

Console Output:

Empty

Error Output:

Empty

Friday, July 17, 2026

Kevin Leman’s assertion is that our earliest vivid childhood memories function as a diagnostic window into the private logic that organizes our adult behavior, and that by identifying, reframing, and deliberately re experiencing those memories we can change persistent habits and improve relationships.

Leman frames early recollections not as random snapshots but as lifestyle themes—selective, emotionally charged memories that reveal what we value, fear, and expect from the world. He borrows from Adlerian ideas to argue that these memories encode a personal philosophy formed in childhood: a shorthand interpretation of how the world works and how we fit into it. The book teaches a practical method for turning those recollections into usable data. Readers are guided to retrieve their earliest memories, note the roles they played and the emotions present, and then test those impressions against current patterns of thought and behavior. This diagnostic step is designed to expose the private logic—the often-unexamined beliefs such as “I must please others to be loved” or “I am safest when I stay invisible”—that quietly directs choices in work, family, and friendships.

Leman emphasizes birth order and family dynamics as recurring influences on the kinds of memories people retain and the roles they adopt. Firstborns, middles, lasts, and only children tend to store different themes—responsibility and leadership, peacemaking and invisibility, charm and attention seeking, or self reliance and perfectionism—and these themes map onto predictable strengths and blind spots. Importantly, Leman treats these patterns as heuristics rather than immutable laws: they illuminate tendencies that can be acknowledged and adjusted. He also introduces the consistency factor—the degree to which a theme repeats across multiple memories—as a way to distinguish a meaningful pattern from an isolated incident.

The book emphasizes re scripting: the deliberate reinterpretation of a formative memory from an adult perspective. Re scripting involves three steps—recall the memory in detail, analyze the role and emotion it encodes, and then reframe the event with adult context and compassion. This process reduces the emotional charge of the original memory and weakens the automatic behaviors it supports. Leman pairs re scripting with forgiveness and behavioral experiments: forgiving caregivers or peers where appropriate, and then practicing small corrective actions that generate new, corrective memories. Over time these new experiences replace the old private logic with a more adaptive narrative.

Overall, the book functions as a concise, practitioner oriented course in self understanding: use early memories as diagnostic evidence, identify recurring themes and family influences, challenge limiting beliefs, and practice new behaviors that create different memories and outcomes. The result is a pragmatic roadmap for translating insight into sustained personal change.

#codingexercise: Codingexercise-07-17-2026.docx

Thursday, July 16, 2026

This article explains how to integrate the two innovative techniques described in the references into DVSA API. These can materially improve both the backend fidelity of multimodal reasoning over aerial imagery and the frontend visualization and exploration of analytic results, and the path to doing so is practical and incremental:

Begin by treating each flight and each image as a richly annotated multimodal record that pairs high quality visual evidence (bounding boxes, masks, per frame metadata) with human or agent Q&A transcripts and model provenance, then extend the training and inference pipelines to enforce faithfulness constraints during reasoning and to surface those grounded traces in an interactive, GPU accelerated visualization that treats image landmarks as navigable particles. On the backend, adopt the Faithful GRPO philosophy by converting reasoning quality metrics into verifiable constraints rather than soft rewards: instrument one’s pipeline so that every reasoning trace produced by a VLM or multimodal agent is accompanied by two verifiable signals — a logical consistency score (LLM judge) and a visual grounding score (VLM judge or IoU matching against detector outputs) — and feed those signals into a constrained RL loop that uses Lagrangian dual ascent to adaptively enforce thresholds on consistency and grounding during policy updates; this requires adding a training harness to dvsa api that supports supervised fine tuning on curated chain of thought (CoT) examples followed by RL with constraint enforcement, logging batch level constraint satisfaction and Lagrange multipliers, and normalizing advantage signals for task and constraints separately so no single signal cancels another.

Practically, implement a modular judge service in dvsa api that can run LLM based consistency checks (prompted judges that verify entailment between chain steps and final answer) and VLM based grounding checks (IoU matching between referenced bounding boxes in the reasoning trace and detector outputs), and expose these as deterministic, testable functions so they can be used both in training and in runtime QA. Store the outputs of these judges alongside each run in the database so one can compute per model, per flight metrics and trigger retraining or human review when constraints are violated; the attached document reports that constrained training “reduces inconsistency from 26.1% to 1.7% and boosts semantic grounding scores by 13%,” which demonstrates the practical payoff of enforcing such constraints (from the attached document). To support these capabilities, extend dvsa api’s data model to include structured reasoning traces, per step bounding box references, and judge verdicts; add provenance fields (model_version, agent_id, run_id) and make them mandatory in RunOutput so every analytic claim is auditable. For grounding rewards one would need a reliable object detector producing deterministic bounding boxes and masks; integrate a production detector (or a lightweight YOLO/Detectron adapter) into the pipeline and use its outputs as the visual teacher for IoU scoring and for generating the high quality CoT training data via MCTS or other synthesis techniques described in the document. Implement a data curation pipeline that synthesizes CoT traces with explicit bounding box references (the document’s MCTS approach is a useful pattern) so one’s supervised stage has high quality, grounded examples before RL. Instrument training with metrics and automated curricula: when constraints are violated, increase the Lagrange multipliers to bias learning toward satisfying grounding and consistency, and log these dynamics so one can tune thresholds and multipliers for one’s domain.

On the visualization and UX side, adapt PhotoDance’s dual view and GPU accelerated strategies to present aerial images as collections of landmarks and to let operators fluidly explore both spatial and semantic structure. Treat each aerial image like a “photo collection” where landmarks (parking lots, buildings, trees, vehicles) are first class particles with attributes (class, confidence, timestamp, altitude, provenance). Build a faceted spatial explorer (the Galaxy View analogue) that maps these particles into multiple coordinate systems — geographic layout, semantic axes (object density, anomaly score), and temporal axes — and allow operators to reindex on the fly. Implement a Mosaic View that composes a high resolution canvas from many tiles or thumbnails so operators can zoom from a global corridor view down to pixel level evidence; use WebGL2 shaders and a dynamic GPU sprite atlas to maintain 60fps interactions even with tens of thousands of particles, and implement an LRU texture cache and request throttling so network and memory usage remain bounded during rapid pans. PhotoDance’s approach to GPS interpolation and burst collapse translates directly: when multiple sensors or devices capture overlapping frames, propagate high quality GPS from one device to adjacent frames, collapse near duplicate frames using perceptual hashing, and present a burst review mode for operators to select the best frame for annotation or training. For large datasets, precompute simplified polylines and spatial tiles for flights so the UI can prefilter candidate historical flights quickly; use spatial indexes (PostGIS or SQL Server geography) to find nearby flight corridors and then fetch detailed particle data for the selected corridor.

To connect the backend constraints and the frontend visualization, expose APIs that return not only detections but also the reasoning trace, judge verdicts, and provenance for each analytic claim; in the planner UI surface a compact digest of prior Q&A and the judge scores so operators can see whether a prior claim was consistent and visually grounded. Implement embedding based retrieval over QA transcripts to surface the most relevant prior conversations for a proposed corridor, and combine keyword matching with semantic similarity to rank prior exchanges. For flight planning, compute corridor similarity using a two stage approach: use spatial DB prefiltering (STDistance, bounding box overlap) to find candidate historical flights, then compute a more expensive polyline similarity (Frechet or Hausdorff on simplified polylines) in application code to rank matches; present aggregated analytics (common hazards, typical altitudes, detection confidence distributions) and the most relevant QA snippets with judge scores so operators can make informed decisions. Operationalize the system with background indexing jobs that compute judge scores and embeddings as flights are ingested, a cache of hot location summaries, and governance controls to redact sensitive transcripts or require human approval for automated recommendations. For evaluation, create synthetic datasets and CoT traces, run ablation studies to measure the impact of constraint thresholds on both accuracy and faithfulness, and instrument user studies to validate that the dual view visualization improves operator situational awareness and planning outcomes. By combining constrained multimodal training (Faithful GRPO style) with PhotoDance inspired GPU visualizations and careful provenance and spatial indexing, DVSA API can evolve into a system that not only produces more trustworthy, grounded analytic claims but also presents them in an interactive, scalable interface that treats each aerial image as a navigable collection of landmarks and prior human knowledge, thereby giving drone operators a practical, evidence backed “street map for the sky” to inform safer and more effective flight planning.

References:

1. Technique 1: Reasoning enhancement: Faithful GRPO: https://arxiv.org/pdf/2604.08476

2. Technique 2: Visualization enhancement: PhotoDance by Stephen Drucker: https://tinyurl.com/dvsadance

3. Drone Video Sensing Analytics: https://github.com/ravibeta/dvsa-api

#codingexercise: Codingexercise-07-16-2026.docx

Wednesday, July 15, 2026

DVSA‑API can become an airspace‑aware planner by indexing past flight paths with spatial types, performing nearest‑neighbor and corridor matching around a proposed mission, and surfacing prior analytics and Q&A tied to those historical paths to inform current flight planning.

To make this practical for drone operators, treat each recorded flight as a first‑class spatial object: store the flight’s polyline (sequence of GPS points) as a geography/geometry column in your database, persist per‑segment metadata (altitude, timestamp, sensor id, model versions used for analytics), and attach a searchable transcript or Q&A log for operator interactions and automated annotations. Using SQL Server’s geography type (or the equivalent in PostGIS) lets you run efficient nearest‑neighbor and distance queries against a candidate origin or corridor; nearest‑neighbor queries that use STDistance() and a spatial index are the canonical way to find the closest historical objects and will leverage spatial indexes when written to the recommended pattern (TOP, ORDER BY STDistance(), and a spatial index on the column).

Implementation begins with a small schema extension. Add a flights table with columns flightid, operatorid, starttime, endtime, path geography (SRID set consistently, e.g., 4326), bbox geography (optional precomputed envelope), summary jsonb (or NVARCHAR(MAX) for structured metadata), and qalog referencing a flightqa table that stores timestamped question/answer pairs, agent ids, and model versions. Index path with a spatial index and consider a secondary index on starttime and operatorid for temporal/operator filtering. When ingesting a new flight, compute and store a simplified polyline (e.g., Douglas‑Peucker) for fast matching and a higher‑resolution polyline for replay and analytics provenance.

For matching a proposed mission, compute a candidate origin point or proposed corridor (a buffered polyline) and run a nearest‑neighbor query against flights.path.STDistance(@candidate) ordering by distance and limiting results to the top N. Use additional predicates to filter by altitude bands, time of day, or sensor type. If you need corridor similarity rather than point proximity, compute Hausdorff or Frechet‑like similarity on simplified polylines in application code and use the database to prefilter by bounding boxes and STDistance thresholds. Ensure SRIDs match to avoid NULLs from spatial methods.

Once matches are found, enrich the planner UI or API response with aggregated analytics: common hazards observed, object‑detection summaries, typical wind/altitude behaviors, and the Q&A transcripts tied to those flights. Present provenance (which model version produced each analytic) and confidence metrics so operators can weigh historical evidence. Provide an API endpoint /planning/suggest that accepts a proposed origin and optional corridor, returns ranked historical flights with distance scores, aggregated analytics, and a compact digest of prior Q&A relevant to the corridor (use simple keyword matching plus embedding similarity over the QA text to surface the most relevant prior exchanges).

Operational extensions include background jobs to index new flights, a cache of nearest‑neighbor results for hot locations, and a privacy layer to redact or anonymize sensitive transcripts. For scale, shard or partition flight data by geographic tiles or time windows and maintain a materialized summary table of popular corridors. For testing and validation, create synthetic flight datasets and unit tests that assert nearest‑neighbor queries return expected flights and that merge logic for overlapping segments is deterministic.

By combining spatial database primitives, careful schema design, provenance‑aware analytics, and a QA index tied to flights, DVSA‑API can provide drone operators with a “street‑map for the sky” experience: contextual, evidence‑backed suggestions for routing and risk mitigation drawn from the operator’s own historical flights and conversations.

Reference: https://github.com/ravibeta/dvsa-api

Tuesday, July 14, 2026

This is a summary of the book titled “AI Engineering” written by “Chip Huyen” and published by O’Reilly in 2025. AI Engineering is about applications not just models. We could learn how to develop models and navigate challenges that might arise during the process, but we must also learn how to adapt a model to a specific need especially when there are choices of models available for download from those skilled at building these. Datasets are another area of emphasis because most models are as good as the data that they operate on. These are some ways in which AI engineering differs from machine learning engineering. AI models require both instructions and information. Enhancing instructions requires “prompt engineering” and enhancing information requires “retrieval-augmented generation” and “agents”. Prompt engineering is human-to-AI communication that is most effective for certain types of tasks. Retrieval-augmented generation aka RAG is primarily used for constructing contexts. Autonomous agents are more versatile. These enhancements reduce errors from “bias” and “hallucinations” which result from incomplete or inaccurate responses.

AI engineering is a rapidly growing field that focuses on building applications on top of readily available models. Applications like ChatGPT and Google's Gemini and Midjourney require significant amounts of data and electricity to make them powerful and efficient. AI engineering has become one of the fastest-growing engineering disciplines, as demand for AI applications has increased while the barrier to entry for building AI applications has decreased. Training large language model (LLM) AIs requires huge amounts of data and computational power, and self-supervision allows models to infer how to label data based on input data. Foundation AI models, which are trained on enormous amounts of data, can handle a wide range of tasks, such as generating product descriptions or refining descriptions based on customer reviews. AI engineering involves developing applications on top of these foundation models, which are versatile and attract billions in investment. However, evaluating an AI model is challenging, and training foundation models is a complex and expensive endeavor.

AI models are only as good as the data they were trained on. Poor data quality, such as misinformation and conspiracy theories, can lead to questionable outputs. Training data is limited in language terms, with English being the most common language. Many languages are not even included in the data, making some models more likely to have performance problems when operating in non-English languages. To choose the right foundation model, evaluate applications and determine how to measure their success. Assessing an application's effectiveness in domain-specific capability, generation capability, instruction-following capability, and cost and latency is crucial. Evaluating the moral or ethical status of an application is also important. Finally, there must be distinction between what is wanted and what is needed when assessing models.

Prompt engineering is the process of creating instructions to achieve desired outputs from an AI model. It involves giving instructions to the model to elicit desired outputs, which can be optimized through statistics and practices like dataset curation. A good prompt should have three features: a broad description of the desired output, relevant examples, and a task to examine a specific text and extract all instances of that type of language. The amount of prompt engineering needed depends on the model's quality and robustness. Context and context length are crucial, with the space available for context length increasing dramatically in recent years. However, good prompt engineering practices are still essential for complex outputs. AI models require instructions and adequate contextual information to complete tasks. Context can be built through retrieval-augmented generation (RAG) and agents, with RAG facilitating information retrieval from independent data sources and agents enabling internet searches for relevant information.

RAG and agentic patterns are powerful AI models that have captured the collective imagination, leading to incredible demos and products. RAG accesses relevant information from various sources, allowing for detailed and informed query responses and reducing hallucinations. Agents, or intelligent agents, are AI's ultimate aim and can perceive and interact with their environment. RAG and agent systems require prompts and vast amounts of information, sometimes overwhelming a system's memory capacity. However, models can be adapted for specific tasks or industries through additional training. Fine-tuning can enhance domain-specific capabilities and strengthen safety. Customized foundation models often require more up-front investment due to memory demands. Parameter-efficient fine-tuning (PEFT) is a popular method to optimize memory. Transfer learning is an important concept in adapting foundation models in memory-efficient ways, allowing models to learn and be customized with fewer examples, leveraging a good base model.

A model's performance relies on its training data, and dataset engineering aims to create a customized model within budget constraints. As models become more complex, investment in data and skilled personnel is increasing. AI is becoming more data-centric, focusing on improving performance by enhancing data processing techniques and creating high-quality datasets. Quality data enhances model performance, speed, and contexts, while low-quality data increases errors and biases. Data selection should involve understanding the model's workings and working closely with model and application developers. Minimal amounts of high-quality data are better than massive amounts.

Using DroneNLP/dataset with dvsa-api

The DroneNLP dataset provides a strong foundation for rising up the operatiors stack with safety, and forensic analysis. It is a curated collection of drone flight log messages that includes raw, cleansed, and annotated data, along with task-specific splits for problem identification and event recognition. In practical terms, this means the dataset can support models that classify whether a flight log indicates a problem, identify what kind of problem is present, detect events in a flight timeline, and label those events in a structured way. The broader DroneNLP ecosystem also includes related tools such as DFLER, a drone flight log entity recognizer for components, parameters, functions, actions, issues, and states; ADFLER, which supports automated drone flight log event recognition; and LogNexus, which extracts meaningful sentences from noisy logs. Together, these assets create a transparent data preparation and modeling pipeline that moves from raw drone logs to processed, annotated, and operationally useful intelligence.

Although the dvsa-api repository is not directly indexed here, it can reasonably be treated as a programmable interface to DVSA-related data, such as vehicle records, test outcomes, defects, incidents, and transport safety events. Under that assumption, the integration opportunity is to use dvsa-api as a controlled service or library that exposes structured safety data while DroneNLP contributes the unstructured and semi-structured intelligence extracted from drone flight logs. The result is a realistic technical foundation for connecting drone operational evidence with broader safety, inspection, and compliance records.

Cross-modal safety intelligence is the first choice for use cases. DroneNLP can identify problems, entities, anomalies, and events in flight logs, while dvsa-api can provide structured records about vehicles, defects, test outcomes, and incident histories. Combining these sources would make it possible to build richer risk and incident models than either dataset could support alone. A drone log might reveal repeated GPS drift, battery degradation, sensor anomalies, or loss-of-control events, while DVSA records might show related patterns of vehicle defects, operational failures, or inspection outcomes. When these signals are linked across time, location, operator, and asset type, they can support stronger incident reconstruction, more proactive safety management, and better compliance planning.

One could position the combined system as an operational flight planning assistant rather than merely an analytics backend. The inspiration is similar to ForeFlight in aviation, where weather, navigation, route planning, compliance checks, and pilot workflow are integrated into a single trusted environment. In the drone context, DroneNLP and dvsa-api could be combined to create a workflow-native planning layer for drone and fleet operators. Instead of using log analysis only after incidents occur, the system would use historical log intelligence, current operational data, and structured safety records to help operators plan safer and more efficient missions before takeoff.

Such a Drone Operations Planning Assistant would begin with pre-flight risk assessment. Historical DroneNLP logs could be analyzed for recurring issues such as GPS drift in particular zones, repeated battery constraints, or sensor anomalies under certain operating conditions. DVSA-api data could then contribute structured safety context, such as defect patterns, inspection histories, or location-linked operational risk. These signals could be fused into a consolidated risk score that operators can review before launching a mission. The same platform could also support route optimization by identifying critical coverage areas, accounting for log-derived constraints such as battery limits or sensor reliability, and recommending flight paths that balance coverage, safety, and mission objectives.

Continuity planning would be another important capability. DroneNLP event recognition can detect gaps, interruptions, or incomplete mission segments in prior logs, while dvsa-api or related geospatial services can help index affected locations and mission areas. The assistant could recommend re-flight segments, overlapping passes, or additional inspection coverage where prior evidence is weak. This would make the system more than a reporting tool; it would become a mission continuity layer that helps operators understand what has already been covered, where uncertainty remains, and how to complete the operational picture.

The architecture for this use case can be described as a sequence of connected layers. A data ingestion layer would collect historical and real-time DroneNLP logs together with DVSA-api records, video streams, geospatial indexes, or other operational data. An analysis layer would run DroneNLP models to extract anomalies, components, issues, and constraints, while dvsa-api services would compute or retrieve structured safety and compliance signals. A fusion layer would align log-derived constraints with geospatial and structured safety data to produce consolidated risk and coverage maps. A planning layer would use those maps to support route optimization, continuity planning, and risk-aware mission recommendations. Finally, an interface layer would present the results in operator-native language, using concepts such as coverage gaps, risk zones, mission continuity, and human-in-the-loop overrides rather than purely technical model outputs.

The underlying schema could center on a small number of reusable entities. A Drone Log Event would capture timestamp, component, anomaly type, issue type, and severity. A Video Anomaly or geospatial observation would capture frame or segment identifiers, coordinates, anomaly type, and confidence. A Risk Zone would represent a geospatial area with a risk score and a source signal, whether from logs, video, DVSA records, or a combination of sources. A Flight Plan would contain route segments, coverage score, risk score, and continuity flags. These entities would be connected through relationships that map drone events to risk zones, geospatial observations to risk zones, and flight plans to the risks and constraints they must account for. This structure would make the system extensible while preserving explainability.

A second promising direction is a forensic correlation engine for incidents. DroneNLP already has a natural forensic orientation because it can extract entities, identify events, and reconstruct timelines from messy operational logs. When paired with dvsa-api, this capability could help investigators connect drone incidents with related vehicle records, test results, defects, and safety events. The system would ingest a drone incident case, retrieve relevant DVSA records for the appropriate time, location, operator, or asset, enrich the drone logs using DFLER and event recognition models, and then apply temporal, spatial, and semantic correlation logic. Temporal alignment would identify events that occurred within relevant time windows. Spatial alignment would connect incidents or observations that occurred within proximity thresholds. Semantic alignment would use an ontology to relate differently worded but conceptually similar issues, such as a deceleration anomaly and a braking-related defect.

The forensic interface could present a combined timeline of drone events and DVSA records, an entity graph linking drones, vehicles, operators, locations, issues, and components, and an evidence summary that explains why the system believes certain events are related. This would be useful for regulators, insurers, safety teams, and operators because it would reduce the manual burden of reconstructing complex multi-asset incidents. It also creates a strong research agenda around explainable AI in safety investigations. The system should be careful not to overstate causality, however. Correlations should be presented with uncertainty, confidence levels, and supporting evidence so that investigators can make informed judgments rather than accepting automated conclusions uncritically.

The modeling approach for risk scoring could include time-series methods, survival analysis, hazard models, graph-based models, or embedding-based fusion. Drone logs could be represented as text embeddings or structured event sequences, while DVSA records could be represented as structured embeddings derived from defects, outcomes, and asset histories. A joint model could then learn patterns that predict elevated risk. This direction has clear operational value because it shifts the platform from reactive analysis to proactive safety management. It also offers a strong research angle in multi-source risk modeling and can be evaluated quantitatively using historical data. The main limitations are the need for sufficient historical data, careful handling of overfitting, and responsible communication of risk scores so that users understand uncertainty and do not treat scores as deterministic judgments.

Maintenance and compliance analytics provide another practical extension. DroneNLP can infer maintenance needs from flight logs by identifying recurring issues, affected components, and abnormal states. These signals can be compared with DVSA defect categories and inspection outcomes to build a cross-domain defect taxonomy. For example, a drone motor overheating pattern might be mapped to a broader powertrain or propulsion-related defect category, while repeated GPS or sensor issues might be mapped to reliability or control-system concerns. Once this taxonomy exists, the system could recommend maintenance actions, track whether those actions are performed, and measure whether they reduce recurrence. It could also compute maintenance effectiveness, defect recurrence patterns, and mean time between defects.

This maintenance-oriented approach would bridge drone operations with established inspection and compliance practices. It would also fit naturally into observability dashboards because the outputs are operationally meaningful: recurring defects, maintenance recommendations, compliance status, and post-intervention improvement. The principal challenge is that mapping between drone issues and DVSA-style defect categories may be partly subjective, and drone-side maintenance data may be sparse or inconsistent. Even so, the approach is valuable because it creates a practical pathway from raw log intelligence to maintenance planning and compliance improvement.

A final research-oriented direction is to use DVSA data as an external benchmark for DroneNLP models. The central question is whether DroneNLP’s event and problem labels are consistent with broader safety patterns found in structured inspection, defect, or incident records. For example, if certain regions or operators show elevated defect rates in DVSA data, researchers could examine whether DroneNLP models also detect more frequent or severe drone log issues in corresponding contexts. This would not prove causality, but it could support cross-domain consistency checks and help validate whether the models capture meaningful safety signals.

The benchmarking framework could also support robustness testing and transfer learning. Synthetic noise or domain shifts could be introduced into drone logs to evaluate how well DroneNLP models maintain performance, especially when augmented with contextual signals from DVSA records. If DVSA defect descriptions include textual data, they could also be used to pretrain or augment models for safety-related language understanding. Evaluation scripts would pull DVSA data for selected cohorts, run DroneNLP models on corresponding logs, and compute cross-domain metrics such as correlation, mutual information, or consistency between predicted problems and observed safety outcomes. This direction is especially well suited for academic publishing because it strengthens the scientific rigor of DroneNLP work and frames the integration as a broader contribution to multi-domain validation of safety NLP systems.

Among these proposals, the highest-priority path is to begin with the operational flight planning assistant because it aligns strongly with observability, inflection detection, importance sampling, and operator-facing decision support. It also creates a broad platform on which the other ideas can build. Once the planning assistant can ingest logs, retrieve structured safety data, identify risk zones, and support route or mission decisions, the same data foundation can support forensic correlation and predictive risk scoring. The forensic engine would add investigative depth, while predictive scoring would extend the system toward proactive safety management. Maintenance and compliance analytics, along with benchmarking and evaluation, can then become complementary modules or follow-on research projects.

The strategic value of this integration is that it moves dvsa-api from being a data access layer into becoming part of an operational intelligence platform. DroneNLP contributes domain-specific textual intelligence from flight logs; dvsa-api contributes structured safety, inspection, and incident context; and the combined system can support planning, investigation, prediction, maintenance, and evaluation. The strongest narrative is not simply that two datasets can be joined, but that unstructured drone operational experience can be converted into actionable safety intelligence when fused with structured regulatory and inspection data. This positions the platform as operator-empowering AI: practical, workflow-native, explainable, and grounded in real safety evidence.

Monday, July 13, 2026

Sample code to query Qwen2.5VL-7B VLM model in chat mode

#! /usr/bin/python

# filename: vlm_scene_query.py

"""

Download an aerial image from an Azure SAS URL, extract XFIF/GPS metadata,

and query a vision-language model (e.g., Qwen2.5VL-7B VLM) to answer a

scene-level question such as estimating area based on parking spot counts.

Usage:

export MODEL_ID="your-qwen-model-id-or-hf-repo"

export HF_API_TOKEN="..." # if required by the model host

python vlm_scene_query.py --sas-url "<SAS_URL>" --question "Estimate area in square meters"

Notes:

- This script uses the Hugging Face transformers pipeline as a generic interface.

Qwen VLM may require a provider-specific SDK or a different pipeline name.

Replace the model loading section with the provider-specific code if needed.

- The script extracts GPS EXIF if present and returns it with the model response.

"""

import os

import sys

import argparse

import tempfile

import json

import math

import logging

from typing import Optional, Dict, Any, Tuple

import requests

from PIL import Image

import exifread

# Optional: transformers pipeline for vision-language models

try:

from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM

HF_AVAILABLE = True

except Exception:

HF_AVAILABLE = False

logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")

logger = logging.getLogger("vlm_scene_query")

def download_image_from_sas(sas_url: str, dest_path: str, timeout: int = 30) -> None:

"""Download an image from an Azure SAS URL to dest_path."""

logger.info("Downloading image from SAS URL")

resp = requests.get(sas_url, stream=True, timeout=timeout)

resp.raise_for_status()

with open(dest_path, "wb") as f:

for chunk in resp.iter_content(chunk_size=8192):

if chunk:

f.write(chunk)

logger.info("Downloaded image to %s", dest_path)

def extract_gps_from_exif(image_path: str) -> Dict[str, Any]:

"""Extract GPS EXIF data (if present) using exifread and return a dict."""

logger.info("Extracting EXIF metadata")

with open(image_path, "rb") as f:

tags = exifread.process_file(f, details=False)

gps = {}

def _get(tag):

return tags.get(tag)

# Common EXIF GPS tags

lat_ref = _get("GPS GPSLatitudeRef")

lat = _get("GPS GPSLatitude")

lon_ref = _get("GPS GPSLongitudeRef")

lon = _get("GPS GPSLongitude")

alt = _get("GPS GPSAltitude")

if lat and lon and lat_ref and lon_ref:

def _to_deg(value):

# value is like [Rational(37,1), Rational(46,1), Rational(0,1)]

try:

parts = [float(x.num) / float(x.den) for x in value.values]

deg = parts[0] + parts[1] / 60.0 + parts[2] / 3600.0

return deg

except Exception:

return None

lat_deg = _to_deg(lat)

lon_deg = _to_deg(lon)

if lat_deg is not None and lon_deg is not None:

if str(lat_ref).upper().startswith("S"):

lat_deg = -lat_deg

if str(lon_ref).upper().startswith("W"):

lon_deg = -lon_deg

gps["latitude"] = lat_deg

gps["longitude"] = lon_deg

if alt:

try:

gps["altitude"] = float(alt.values[0].num) / float(alt.values[0].den)

except Exception:

pass

# XFIF or other tags may be present; include raw tags for inspection

gps["raw_tags"] = {k: str(v) for k, v in tags.items() if k.startswith("GPS")}

return gps

def default_area_estimate_from_parking_count(parking_count: int,

spot_length_m: float = 4.5,

spot_width_m: float = 1.8,

spacing_factor: float = 1.2) -> Tuple[float, float]:

"""

Estimate area in square meters and square feet given a count of parking spots.

Default sedan footprint: 4.5m x 1.8m = 8.1 m^2. spacing_factor accounts for drive lanes and spacing.

Returns (area_m2, area_ft2).

"""

single_spot_area = spot_length_m * spot_width_m * spacing_factor

total_m2 = parking_count * single_spot_area

total_ft2 = total_m2 * 10.7639

return total_m2, total_ft2

def build_prompt_for_vlm(question: str, guidance: Optional[str] = None) -> str:

"""

Build a clear prompt for the vision-language model. Guidance can include

assumptions to make (e.g., sedan footprint).

"""

base = (

"You are given an aerial image. Answer the user's question precisely. "

"If you need to make reasonable assumptions, state them explicitly. "

"Return a JSON object with keys: 'answer_text', 'parking_spot_count' (int or null), "

"'assumptions' (list of strings), and 'computed' (object with numeric fields). "

)

if guidance:

base += guidance + " "

base += "User question: " + question

return base

def query_vlm_with_image(model_id: str, image_path: str, prompt: str, hf_token: Optional[str] = None) -> Dict[str, Any]:

"""

Query a vision-language model with the image and prompt.

This uses the Hugging Face pipeline interface as a generic example.

Replace with provider-specific SDK if required by the model.

"""

if not HF_AVAILABLE:

raise RuntimeError("transformers not available in this environment; cannot load VLM pipeline")

# If the model requires authentication, set the token in the environment for HF

if hf_token:

os.environ["HUGGINGFACEHUB_API_TOKEN"] = hf_token

# Attempt to use a generic image-to-text pipeline. Some VLMs require custom loading.

logger.info("Loading VLM model pipeline (model_id=%s)", model_id)

try:

# Many VLMs use a "vision-text" or "image-to-text" pipeline; this is a best-effort example.

vlm = pipeline(task="image-to-text", model=model_id, device=0 if torch_cuda_available() else -1)

except Exception:

# Fallback: try seq2seq with processor + model (provider-specific)

# Provide a helpful error message to the user

raise RuntimeError(

"Failed to instantiate a generic image-to-text pipeline for model_id=%s. "

"Qwen VLMs often require provider-specific SDKs or a custom pipeline. "

"Replace this function with the provider's recommended loading code." % model_id

)

logger.info("Running VLM inference")

# Many pipelines accept a list of images and optional prompt; adapt as needed

with open(image_path, "rb") as f:

image_bytes = f.read()

# The pipeline may accept a PIL Image or bytes; try PIL first

pil_img = Image.open(image_path).convert("RGB")

# Some pipelines accept a 'prompt' kwarg; others require concatenating prompt to input.

try:

result = vlm(pil_img, prompt=prompt, max_new_tokens=512)

except TypeError:

# If pipeline doesn't accept prompt kwarg, pass prompt as first argument

result = vlm(prompt + "\n", pil_img, max_new_tokens=512)

# result is often a list of dicts or a string

logger.debug("Raw VLM result: %s", result)

# Normalize to dict

if isinstance(result, list) and len(result) > 0:

raw_text = result[0].get("generated_text") or result[0].get("text") or str(result[0])

elif isinstance(result, dict):

raw_text = result.get("generated_text") or result.get("text") or json.dumps(result)

else:

raw_text = str(result)

# Try to parse JSON from the model output; if not JSON, return text in answer_text

try:

parsed = json.loads(raw_text)

return parsed

except Exception:

return {"answer_text": raw_text, "raw": raw_text}

def torch_cuda_available() -> bool:

try:

import torch

return torch.cuda.is_available()

except Exception:

return False

def parse_parking_count_from_vlm_response(vlm_resp: Dict[str, Any]) -> Optional[int]:

"""

Extract parking_spot_count if present in the VLM response dict.

"""

try:

count = vlm_resp.get("parking_spot_count")

if count is None:

# Try to parse from answer_text heuristically

text = vlm_resp.get("answer_text", "")

# naive heuristic: find first integer in text

import re

m = re.search(r"\b(\d{1,4})\b", text)

if m:

return int(m.group(1))

return None

return int(count)

except Exception:

return None

def main():

parser = argparse.ArgumentParser(description="Query a VLM about an aerial image from an Azure SAS URL")

parser.add_argument("--sas-url", required=True, help="Azure SAS URL to the JPEG image")

parser.add_argument("--question", required=True, help="Natural language question to ask the VLM")

parser.add_argument("--model-id", default=os.environ.get("MODEL_ID", "Qwen/Qwen-2.5V-L-7B"), help="VLM model id or repo")

parser.add_argument("--hf-token", default=os.environ.get("HF_API_TOKEN"), help="Hugging Face API token if required")

parser.add_argument("--assume-spot-length-m", type=float, default=4.5, help="Assumed parking spot length in meters")

parser.add_argument("--assume-spot-width-m", type=float, default=1.8, help="Assumed parking spot width in meters")

parser.add_argument("--spacing-factor", type=float, default=1.2, help="Factor to account for drive lanes and spacing")

parser.add_argument("--no-vlm", action="store_true", help="Skip VLM and use deterministic heuristic only")

args = parser.parse_args()

with tempfile.TemporaryDirectory() as tmpdir:

img_path = os.path.join(tmpdir, "scene.jpg")

try:

download_image_from_sas(args.sas_url, img_path)

except Exception as e:

logger.error("Failed to download image: %s", e)

sys.exit(1)

gps = extract_gps_from_exif(img_path)

logger.info("Extracted GPS metadata: %s", gps)

# Build prompt

guidance = (

f"Assume a typical U.S. sedan footprint of {args.assume_spot_length_m}m x {args.assume_spot_width_m}m "

f"and a spacing factor of {args.spacing_factor} to account for drive lanes. "

"Count visible parking spots if possible and compute total area in square meters and square feet."

)

prompt = build_prompt_for_vlm(args.question, guidance=guidance)

vlm_response = None

parking_count = None

if not args.no_vlm:

try:

vlm_response = query_vlm_with_image(args.model_id, img_path, prompt, hf_token=args.hf_token)

logger.info("VLM response received")

parking_count = parse_parking_count_from_vlm_response(vlm_response)

except Exception as e:

logger.warning("VLM query failed or not available: %s", e)

vlm_response = {"error": str(e)}

parking_count = None

# If VLM didn't provide a parking count, fall back to asking user assumption or using a heuristic

if parking_count is None:

# Heuristic fallback: try to detect cars using a very small, dependency-free heuristic is not reliable.

# Instead, we will ask the model's textual output for a number if available; otherwise, default to 10 spots.

if vlm_response and isinstance(vlm_response, dict):

parking_count = parse_parking_count_from_vlm_response(vlm_response)

if parking_count is None:

logger.info("No parking count from VLM; using fallback default of 10 spots for estimation")

parking_count = 10 # conservative default; in production, prefer human-in-the-loop

area_m2, area_ft2 = default_area_estimate_from_parking_count(

parking_count,

spot_length_m=args.assume_spot_length_m,

spot_width_m=args.assume_spot_width_m,

spacing_factor=args.spacing_factor

)

# Build final structured response

response = {

"run_id": f"run-{os.urandom(6).hex()}",

"agent_id": "qwen-vlm-estimator",

"start_time": None,

"end_time": None,

"model_version": args.model_id,

"gps": gps,

"question": args.question,

"vlm_raw_response": vlm_response,

"parking_spot_count_used": parking_count,

"assumptions": [

f"sedan footprint {args.assume_spot_length_m}m x {args.assume_spot_width_m}m",

f"spacing factor {args.spacing_factor}"

"computed": {

"area_m2": round(area_m2, 2),

"area_ft2": round(area_ft2, 2),

"spot_area_m2": round(args.assume_spot_length_m * args.assume_spot_width_m * args.spacing_factor, 2)

"answer_text": (

f"Estimated total area ≈ {round(area_m2,2)} m² ({round(area_ft2,2)} ft²) "

f"based on {parking_count} parking spots and assumed sedan footprint "

f"{args.assume_spot_length_m}m x {args.assume_spot_width_m}m with spacing factor {args.spacing_factor}."

)

}

# Print JSON response

print(json.dumps(response, indent=2))

if __name__ == "__main__":

main()

"""

Sample output:

python Qwen.py --sas-url "https://tinyurl.com/carlot01" --question "what is the size of the parking lot shown in terms of square feet assuming usual size of a sedan in the united states and counting at most 1 occupancy per parking spot" --hf-token "<your-token>"

2026-07-12 19:16:35,444 INFO Downloading image from SAS URL

2026-07-12 19:16:36,468 INFO Downloaded image to C:\Users\ravib\AppData\Local\Temp\tmp6sle4o5d\scene.jpg

2026-07-12 19:16:36,470 INFO Extracting EXIF metadata

2026-07-12 19:16:36,491 INFO Extracted GPS metadata: {'raw_tags': {}}

2026-07-12 19:16:36,492 INFO Loading VLM model pipeline (model_id=Qwen/Qwen-2.5V-L-7B)

2026-07-12 19:16:37,065 INFO HTTP Request: GET https://huggingface.co/api/agent-harnesses "HTTP/1.1 200 OK"

2026-07-12 19:16:37,176 INFO HTTP Request: HEAD https://huggingface.co/Qwen/Qwen-2.5V-L-7B/resolve/main/config.json "HTTP/1.1 401 Unauthorized"

2026-07-12 19:16:37,178 WARNING VLM query failed or not available: Failed to instantiate a generic image-to-text pipeline for model_id=Qwen/Qwen-2.5V-L-7B. Qwen VLMs often require provider-specific SDKs or a custom pipeline. Replace this function with the provider's recommended loading code.

2026-07-12 19:16:37,179 INFO No parking count from VLM; using fallback default of 10 spots for estimation

{

"run_id": "run-70391241b7e3",

"agent_id": "qwen-vlm-estimator",

"start_time": null,

"end_time": null,

"model_version": "Qwen/Qwen-2.5V-L-7B",

"gps": {

"raw_tags": {}

"question": "what is the size of the parking lot shown in terms of square feet assuming usual size of a sedan in the united states and counting at most 1 occupancy per parking spot",

"vlm_raw_response": {

"error": "Failed to instantiate a generic image-to-text pipeline for model_id=Qwen/Qwen-2.5V-L-7B. Qwen VLMs often require provider-specific SDKs or a custom pipeline. Replace this function with the provider's recommended loading code."

"parking_spot_count_used": 10,

"assumptions": [

"sedan footprint 4.5m x 1.8m",

"spacing factor 1.2"

"computed": {

"area_m2": 97.2,

"area_ft2": 1046.25,

"spot_area_m2": 9.72

"answer_text": "Estimated total area \u2248 97.2 m\u00b2 (1046.25 ft\u00b2) based on 10 parking spots and assumed sedan footprint 4.5m x 1.8m with spacing factor 1.2."

}

"""