Monday, May 18, 2026

 

Further deliverables for Drone Video Sensing Analytics (DVSA)

Orientation and House Rules

This establishes the contract that governs the entire article. DVSA unambiguously means Drone Video Sensing Analytics — It is an acronym that refers to the software at CEJMLSubmitDVSA.docx as the authoritative primary source, with citation conventions: APA 7 for academic literature, [GH:org/repo] shorthand for GitHub repositories. DVSA is an end-to-end AI/ML pipeline that ingests drone video at scale, enriches individual frames with spatial metadata, indexes them in vector databases, and exposes the result to analysts and autonomous agents via natural-language retrieval. That pipeline description — from raw telemetry to queryable geospatial intelligence — is the unifying thread through every subsequent research direction.

Thirteen Research Themes

This article condenses the entire programme into thirteen interconnected research themes, each framed as an operational question. On ingestion, the finding is that event-driven micro-batch streaming (Kafka or Kinesis Video Streams) with idempotent writes and SHA-256/pHash deduplication is the only architecture that scales to terabytes per day without duplicating downstream embedding costs. On GPS-less localisation, the finding is that a three-stage cascade — Visual-Inertial Odometry for relative pose, Structure-from-Motion for geo-registration, and orthophoto refinement for sub-5-metre absolute accuracy — succeeds even in GPS-denied or GPS-spoofed environments. On importance sampling, the finding is that a five-filter cascade removes 65–80% of frames before embedding with less than 2% degradation in retrieval recall, making the pipeline economically viable. On vector databases and RAG, the finding is that hybrid dense-plus-sparse search over a 10-million-frame corpus achieves under 50 ms P95 latency on Qdrant, and that RAG layers ground LLM answers in actual indexed frame metadata rather than hallucinated content. On agentic retrieval, the finding is that Plan-and-Execute agents outperform ReAct on multi-hop geospatial queries, reaching 83% task-completion on three-hop benchmarks versus 71% for ReAct. On pixel-to-GPS mapping, accurate mapping is "the linchpin of any downstream geospatial query — errors here propagate through every retrieval and reasoning step." On observability, the finding is that semantic drift — measured as cosine distance between rolling embedding centroids — is the most important pipeline-health signal and should be monitored from day one. On edge vs cloud, the finding is that tiered deployment (lightweight inference on-drone, CLIP embedding on a Jetson edge node, Qdrant and LLM agents in cloud) reduces WAN bandwidth by 99.5% while keeping alert-query latency below 1.5 seconds. On security, the finding is that frame-level ACLs, AES-256 encryption, Open Policy Agent attribute-based access control, and differential privacy noise injection together satisfy both GDPR and sovereign data requirements.

Chapters 1–4: Ingestion, Localisation, and Sampling

Chapter 1 motivates DVSA with scale figures: annual drone shipments exceeding 10 million units and enterprise fleets generating petabyte-scale video archives. The core problem is labelled the semantic gap — raw frames are binary objects with no queryable meaning. Four research questions are posed, targeting sub-$0.001 per-frame ingestion, sub-5-metre GPS-less localisation, sub-100ms P95 hybrid retrieval, and >85% agentic task-completion.

Chapter 2's ingestion finding is that batch upload is insufficient: it delivers high latency, lacks streaming enrichment, and cannot attach per-frame telemetry atomically. The selected architecture — Kafka partitioned by drone_id, with three parallel consumer groups handling provenance writing, frame extraction, and deduplication respectively — achieves 4.2 GB/minute throughput, 38-second P95 end-to-end ingest latency, and verified exactly-once semantics under chaos testing. The provenance schema stores both raw GPS (which may be null) and inferred coordinates (filled later by the localisation pipeline), ensuring a complete audit trail regardless of GPS availability. Two-stage deduplication (SHA-256 exact, then pHash Hamming-distance near-duplicate via Redis Bloom filter) achieves a 67% dedup hit rate on surveillance hover missions, directly reducing downstream GPU costs.

Chapter 3's localisation finding is that VIO alone is insufficient for 10-metre accuracy targets: drift accumulates at ~0.5% of distance travelled, reaching 10 m on a 2 km flight. Stage 2 (OpenSfM reconstruction with SuperPoint+SuperGlue feature matching, which outperforms SIFT by 34% on low-texture scenes) reduces error to 8–15 m. Stage 3 (ICP registration against georeferenced orthophotos from USGS, Mapbox, or operator-generated sources) achieves 3.2–6.1 m across all tested terrain types. Each frame is also enriched with non-coordinate metadata: altitude AGL, sun elevation, ground sample distance, weather conditions, land cover class, and administrative area.

Chapter 4's importance-sampling finding is quantitative and decisive: a five-filter cascade (exact dedup, near-dedup, scene change classification, quality scoring, object-of-interest boost) reduces frames to 20–35% of raw volume while maintaining Recall@10 of 0.93–0.97 across all tested mission profiles. The scene-change classifier is a fine-tuned MobileNetV3-Small running at 850 frame-pairs/second on an A10G GPU. Object-of-interest frames — those flagged by a YOLOv8n edge detector — bypass deduplication entirely, guaranteeing that no event-containing frame is dropped.

Chapters 5–7: Retrieval, Agents, and Spatial Reasoning

Chapter 5's vector-database finding is a four-way benchmark at 10 million vectors. FAISS achieves the lowest latency (12 ms P95) but offers no persistence or metadata filtering, making it unsuitable as a standalone production store. Qdrant, selected for production, achieves 38 ms P95 with native geospatial payload filtering pushed into the HNSW graph traversal — a critical advantage over post-retrieval Python filtering. Pinecone adds zero-ops management but exceeds $800/month at scale and lacks sovereign deployment. pgvector is adequate for development corpora below 2 million vectors. Five embedding models are benchmarked: CLIP ViT-L/14 is selected as the primary frame encoder for its joint image-text space, which enables text-query retrieval without requiring prior caption generation. The RAG pipeline — CLIP text encoding of the query, Qdrant retrieval of top-20 frames, context augmentation with spatial metadata, GPT-4o grounded answer generation — is fully documented with working Python code.

Chapter 6's agentic retrieval finding is that no single agent pattern dominates across all query complexities. Function calling achieves the best simple-query performance (98% one-hop, 3.8 s average). Plan-and-Execute achieves the best multi-hop performance (83% three-hop, but at 6.1 s). ReAct with GPT-4o sits in between; ReAct with Llama-3-70B degrades sharply on complex queries (58% three-hop). The recommended production design hybridises function calling for single-step queries with Plan-and-Execute triggered automatically when the planner detects more than two required tool calls. The agent tool suite spans eight specialised functions, from semantic frame search and object counting to heatmap generation and Visual Question Answering via GPT-4o Vision.

Chapter 7's pixel-to-GPS finding is architectural: for altitudes below 500 m AGL, a flat-earth projection using camera intrinsics, drone altitude, and gimbal pitch/roll/yaw provides acceptable accuracy; above 500 m or on sloped terrain, full orthorectification against a DEM via ray-marching is required.

Chapters 8–10: Operations, Deployment, and Trust

Chapter 8's observability finding is that CPU and memory metrics are necessary but not sufficient for a semantic pipeline. The dvsa_ Prometheus metric namespace defines six purpose-built metric families: ingestion throughput and lag, localisation accuracy histograms and failure counters, embedding throughput and queue depth, retrieval latency and recall, agent task-completion rate and cost-per-query, and — most critically — semantic drift score. Drift is measured as the cosine distance between a rolling 1,000-embedding centroid and a fixed reference centroid established at index build time. A drift threshold of 0.15 triggers a Slack alert; 0.25 triggers automatic re-embedding of the affected time window. The canary query system submits ten pre-labelled queries every five minutes; if recall@10 falls below 0.85 for two consecutive intervals, Alertmanager fires. This combination means model staleness is detected within hours rather than weeks.

Chapter 9's edge-vs-cloud finding resolves a false dichotomy into a four-tier tiered architecture: lightweight INT8-quantised inference runs on-drone (YOLOv8n at 45 fps, pHash at 200+ fps, scene classifier at 60 fps); CLIP ViT-L/14 FP16 embedding runs on a Jetson AGX Orin edge node shared among five drones; Qdrant global search and LLM agent reasoning run in cloud. This tiering reduces WAN data volume from 2.5 GB/drone-hour (raw upload) to 12 MB/drone-hour (embeddings and metadata only) — a 99.5% bandwidth reduction. For time-critical alert queries, the edge-cached Qdrant instance delivers responses in 0.3–1.5 seconds versus 8–15 seconds for a cloud-only path.

Chapter 10's security finding is comprehensive but practically grounded. The threat model identifies four actors: external attackers, insiders, GPS spoofers (who corrupt the spatial index), and prompt-injection attackers who embed directives in video metadata to manipulate the agent layer. Each is mitigated concretely. AES-256-GCM at rest with customer-managed KMS keys and mTLS in transit covers the first two. VIO cross-validation with configurable discrepancy thresholds (default: 15 m) detects spoofed GPS. SQL schema allowlists and input sanitisation guard against injection. GDPR compliance is operationalised as five specific mechanisms: purpose limitation enforced at flight-plan registration, importance sampling as data minimisation, cascading deletion across provenance DB/S3/Qdrant vector, lat-lon-time frame search for data subject access requests, and automated retention-limit alerts. Differential privacy (Laplace mechanism, ε=1.0) protects aggregate analytics. Full lineage graphs track every transformation from raw frame to archived artefact, enabling compliance audits, targeted model-upgrade re-processing, and complete erasure cascades.

Chapters 11–13: Cost, Code, and Validation

Chapter 11's cost finding is its most actionable number: $0.00138 per indexed frame over a three-year, 50-drone fleet baseline, with a 3-year TCO of $371,431. The largest single cost driver is engineering labour (0.5 FTE, $225,000 over three years) — not compute or storage. Storage (S3 tiered to Glacier) costs $54,000 over three years for 750 TB of raw archive. GPU embedding is negligible ($3,942 over three years via API). LLM agent queries at 500/day cost $13,689 over three years via GPT-4o API. Sensitivity analysis reveals that the most consequential lever is eliminating edge inference: doing so adds $180,000 in WAN bandwidth costs over three years, making the $22,000 Jetson CAPEX self-liquidating within three months of fleet operation.

Chapter 12 functions as a consolidated engineering reference. It catalogues 35+ Python packages and GitHub repositories across six categories — ingestion/streaming, localisation/spatial, computer vision/embeddings, vector databases, LLM/RAG/agent frameworks, and observability/operations — each with minimum version, role, and link. Three fully annotated production-ready code examples are provided: dvsa_ingest_worker.py (Kafka consumer with exact and near-dedup, Prometheus instrumentation, and embedding queue push); dvsa_embed_worker.py (batched CLIP ViT-L/14 embedding with Qdrant upsert and throughput gauging); and dvsa_tools.py (LangChain @tool-decorated search_frames function with hybrid Qdrant query and geospatial filter construction).

Chapter 13 validates the entire pipeline through three real-world case studies and ten implementation lessons. The power line survey case study (12 DJI Matrice 350 RTK drones, 800 km of transmission lines) reduced analysis time from three weeks to four hours, saving an estimated $280,000 per survey cycle. The GPS-denied mountain rescue case study (4 fixed-wing UAVs, 40 km² alpine search area) located missing hikers 2.5 hours into the search — a 59% reduction from the 6.1-hour historical average — using VINS-Mono VIO, OpenSfM reconstruction, and a semantic colour-and-terrain query. The multi-season agricultural survey case study (8 multispectral drones, 2,400 ha) used cross-collection spatial joins and a CLIP-proxy NDVI metric to identify the highest-stress field with agronomist-confirmed accuracy, reducing the intervention area from 2,400 ha to 180 ha. The lessons distilled from across all deployments are prescriptive: provenance must be attached atomically at ingest; semantic drift must be monitored from day one; orthophotos must be refreshed quarterly; ReAct must not be used alone for multi-hop queries; embeddings must carry model-version metadata; and GPS telemetry must never be assumed trustworthy.

Conclusion:

The DVSA deliverable constitutes a complete, reproducible, and empirically validated specification for production-grade drone video intelligence at scale — from the first byte off a drone sensor to a natural-language answer grounded in georeferenced frame metadata.


Sunday, May 17, 2026

 AI safety and security are primary concerns for the emerging GenAI applications. Organizations treat the defense-in-depth approach as the preferred path to stronger security for AI solutions. They also engage in feedback from security researchers via programs like AI Red Teaming and Bug Bounty program to make a positive impact to their customers. The following section outlines some of the other best practices that are merely advisory and not a mandate in any way.

As these GenAI applications become popular as productivity tools, the speed of AI releases and adoption acceleration must be matched with improvements to existing SecOps techniques. The security-first processes to detect and respond to AI risks and threats effectively include visibility, zero critical risks, democratization, and prevention techniques. Out of these the risks refer to data poisoning that alters training data to make predictions erroneous, model theft where proprietary AI models suffer from copyright infringement, adversarial attacks by crafting inputs that make model hallucinate, model inversion attacks by sending queries that cause data exfiltration and supply chain vulnerabilities for exploiting weaknesses in the supply chain.

The best practices leverage the new SecOps techniques and mitigate the risks with:

Achieving full visibility by removing shadow AI which refers to both unauthorized and unaccounted for AI. AI bill-of-materials will help here as much as setting up relevant networking to ensure access for only allow-listed GenAI providers and software. Employees must also be trained with a security-first mindset.

Protecting both the training and inference data by discovering and classifying the data according to its security criticality, encrypting data at rest and in transit, performing sanitizations or masking sensitive information, configuring data loss prevention policies, and generating a full purview of the data including origin and lineage.

Securing access to GenAI models by setting up authentication and rate limiting for API usage, restricting access to model weights, and allowing only required users to kickstart model training and deployment pipelines.

Using LLM-built-in guardrails such as content filtering to automatically removing or flagging inappropriate or harmful content, abuse detection mechanisms to uncover and mitigate general model misuse, and temperature settings to change AI output randomness to the desired predictability.

Detecting and removing AI risks and attack paths by continuously scanning for and identifying vulnerabilities in AI models, verifying all systems and components that have the most recent patches to close known vulnerabilities, scanning for malicious models, assessing for AI misconfigurations, effective permissions, network resources, exposed secrets, and sensitive data to detect attack paths, regularly auditing access controls to guarantee authorizations and least-privilege principles, and providing context around AI risks so that we can proactively remove attack paths to models via remediation guidance.

Monitoring against anomalies by using detection and analytics at both input and output, detecting suspicious behavior in pipelines, keeping track of unexpected spikes in latency and other system metrics, and supporting regular security audits and assessments.

Setting up incident response by including processes for isolation, backup, traffic control, and rollback, integrating with SecOps tools, and availability of an AI focused incident response plan.

In this way, existing SecOps practices that leverage well-known STRIDE threat modeling and Assets, Activity Matrix and Actions chart with enhancements and techniques specific to GenAI.


Saturday, May 16, 2026

 The current phase of the AI agent economy is defined by a tension between undeniable productivity gains and uneven monetization, a pattern made clear in recent industry reviews. Across tens of thousands of surveyed users, the strongest signal is that AI is already expanding the amount and type of work individuals can complete. Users report “substantially more productive” outcomes, with 48 percent citing expanded scope of work and 40 percent citing faster execution . These gains are real, measurable, and broadly distributed, yet they do not automatically translate into durable revenue for the companies building these systems. The market is now shifting from hype-driven visibility to a more sober evaluation of where AI actually changes operating leverage.

Commercial traction is emerging most clearly in enterprise environments where workflows are frequent, outcomes are quantifiable, and cost structures are well understood. Customer support illustrates this dynamic: organizations with high ticket volumes and predictable service metrics can immediately measure the impact of automation on cost per interaction. Even modest deflection rates of 20 to 50 percent materially improve margins at scale, making support automation one of the earliest reliable revenue categories. Similar logic applies to sales and revenue operations, where AI agents that automate CRM updates, summarize calls, or draft follow‑ups increase productive selling hours without increasing headcount. In engineering and internal operations, the value proposition is even more direct because skilled labor is expensive and capacity constrained. Tools that reduce debugging time or accelerate documentation by even 20 to 40 percent can outperform many back‑office use cases despite smaller user counts.

The reviews emphasize that Southeast Asia’s SME landscape may represent an underappreciated opportunity. Small and medium enterprises in the region often operate with lean teams and fragmented systems, making AI agents for invoicing, scheduling, multilingual messaging, and collections immediately valuable. These are environments where owner‑level productivity gains translate directly into willingness to pay. The broader pattern is consistent: enterprises pay for AI when it improves labor efficiency, shortens cycles, or generates measurable operating returns.

At the same time, the labor implications are complex. Productivity gains do not necessarily reduce anxiety about job security. The survey shows that roughly one‑fifth of respondents fear displacement, with early‑career workers expressing the highest concern. One article cites that “users who reported the largest speed gains… were also among the most concerned about job loss” . This creates a two‑speed labor market in which junior and repetitive tasks are automated first, potentially compressing the traditional pipeline through which future managers and specialists develop. The next phase of value creation may therefore come not from replacing workers but from enabling one skilled employee to manage the output of multiple AI systems.

Where hype outpaces revenue, the pattern is equally clear. Consumer‑facing general agents attract attention and experimentation, but retention is inconsistent and pricing power is weak. As foundation models improve, standalone wrappers with limited differentiation face increasing pressure. Products with high inference costs but low willingness to pay may show strong usage while generating weak margins. The market increasingly rewards repeat usage, clear ROI, and defensible workflow integration rather than viral adoption.

From an investor perspective, the next winners may appear less glamorous but more economically durable. Metrics such as fast payback periods, high usage frequency, low churn, expansion revenue, proprietary data loops, and strong margins are the most reliable signals of long‑term value. Products embedded deeply into CRM, ERP, ticketing, finance, or operational systems create switching costs that general assistants cannot match. Vertical AI in healthcare administration, legal review, finance operations, logistics, and industrial workflows may therefore outperform broader consumer‑oriented tools.

This reinforces that the majority of AI’s current surplus accrues to individuals rather than institutions. Around 70 percent of respondents say the primary beneficiary of AI productivity is “me,” while only about 10 percent point to employers or clients . This suggests that adoption is still user‑led rather than enterprise‑captured. Historically, technologies such as search, social platforms, and cloud software followed similar trajectories: utility emerged first, monetization matured later. The next stage of the AI agent economy will depend on converting personal productivity gains into enterprise budgets through workflow integration, measurable outcomes, and recurring value.


Friday, May 15, 2026

 Drone Survey Area reconstitution:

Problem statement:

Aerial drone images extracted from a drone video are sufficient to reconstitute the survey area with image selection to create a mosaic that fully covers the survey area. This method does away with the knowledge of flight path of the drone. Write a python implementation that places selections from the input on the tiles in a grid to increase the likelihood of match with the overall survey area.

Solution:

The following is a visual survey approximation, not a georeferenced orthomosaic. Without GPS/EXIF or camera poses from the previous example, the script cannot know the true ground positions, so the grid is an informed montage rather than a mathematically correct map.

Usage:

pip install pyodm

docker run -p 3000:3000 opendronemap/nodeodm --test

Code:

#! /usr/bin/python

import cv2

import numpy as np

from pathlib import Path

import math

def detect_road_like_mask(img):

    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    gray = cv2.GaussianBlur(gray, (5, 5), 0)

    edges = cv2.Canny(gray, 40, 120)

    kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (7, 7))

    closed = cv2.morphologyEx(edges, cv2.MORPH_CLOSE, kernel, iterations=2)

    dilated = cv2.dilate(closed, kernel, iterations=1)

    return (dilated > 0).astype(np.uint8) * 255

def skeletonize(mask):

    mask = (mask > 0).astype(np.uint8)

    skel = np.zeros_like(mask)

    element = cv2.getStructuringElement(cv2.MORPH_CROSS, (3, 3))

    temp = mask.copy()

    while True:

        eroded = cv2.erode(temp, element)

        opened = cv2.dilate(eroded, element)

        temp2 = cv2.subtract(temp, opened)

        skel = cv2.bitwise_or(skel, temp2)

        temp = eroded.copy()

        if cv2.countNonZero(temp) == 0:

            break

    return skel

def border_signature(skel):

    h, w = skel.shape

    return (

        skel[0, :], # top

        skel[-1, :], # bottom

        skel[:, 0], # left

        skel[:, -1], # right

    )

def border_similarity(a, b):

    if a.shape != b.shape:

        return 0

    return np.sum((a > 0) & (b > 0))

def compute_pairwise_border_scores(skeletons):

    N = len(skeletons)

    borders = [border_signature(s) for s in skeletons]

    scores = {}

    for i in range(N):

        for j in range(N):

            if i == j:

                continue

            scores[(i, j)] = {

                "up": border_similarity(borders[i][0], borders[j][1]),

                "down": border_similarity(borders[i][1], borders[j][0]),

                "left": border_similarity(borders[i][2], borders[j][3]),

                "right": border_similarity(borders[i][3], borders[j][2]),

            }

    return scores

def filter_redundant_frames(skeletons, overlap_threshold=0.75):

    N = len(skeletons)

    keep = [True] * N

    for i in range(N):

        if not keep[i]:

            continue

        si = skeletons[i]

        if si is None or si.size == 0:

            keep[i] = False

            continue

        si = si > 0

        for j in range(i + 1, N):

            if not keep[j]:

                continue

            sj = skeletons[j]

            if sj is None or sj.size == 0:

                keep[j] = False

                continue

            sj = sj > 0

            inter = np.sum(si & sj)

            union = np.sum(si | sj)

            if union == 0:

                continue

            iou = inter / union

            if iou > overlap_threshold:

                keep[j] = False

    return keep

def solve_directional_grid(N, scores, min_adj_score=20, direction_bias=1.5):

    G = int(math.ceil(math.sqrt(N)))

    grid = [[None for _ in range(G)] for _ in range(G)]

    used = set()

    grid[0][0] = 0

    used.add(0)

    for r in range(G):

        for c in range(G):

            if r == 0 and c == 0:

                continue

            best_tile = None

            best_score = -1

            for t in range(N):

                if t in used:

                    continue

                score = 0

                if r > 0 and grid[r - 1][c] is not None:

                    above = grid[r - 1][c]

                    vertical_score = scores.get((above, t), {}).get("down", 0)

                    score += vertical_score * direction_bias

                if c > 0 and grid[r][c - 1] is not None:

                    left = grid[r][c - 1]

                    horizontal_score = scores.get((left, t), {}).get("right", 0)

                    score += horizontal_score * direction_bias

                if score > best_score:

                    best_score = score

                    best_tile = t

            if best_score < min_adj_score:

                grid[r][c] = None

            else:

                grid[r][c] = best_tile

                used.add(best_tile)

            if len(used) == N:

                return grid

    return grid

def build_grid_mosaic(images, grid):

    H, W = images[0][1].shape[:2]

    G = len(grid)

    canvas = np.zeros((G * H, G * W, 3), dtype=np.uint8)

    for r in range(G):

        for c in range(G):

            idx = grid[r][c]

            if idx is None:

                continue

            name, img = images[idx]

            y0, y1 = r * H, (r + 1) * H

            x0, x1 = c * W, (c + 1) * W

            canvas[y0:y1, x0:x1] = img

    return canvas

def mosaic_street_grid(folder, out_path="grid_mosaic.jpg"):

    folder = Path(folder)

    images = []

    for p in sorted(folder.iterdir()):

        if p.suffix.lower() in [".jpg", ".jpeg", ".png"]:

            img = cv2.imread(str(p))

            images.append((p.name, img))

    if not images:

        raise RuntimeError("No images found")

    # normalize all images to the size of the first one

    base_h, base_w = images[0][1].shape[:2]

    norm_images = []

    for name, img in images:

        h, w = img.shape[:2]

        if (h, w) != (base_h, base_w):

            img = cv2.resize(img, (base_w, base_h), interpolation=cv2.INTER_AREA)

        norm_images.append((name, img))

    images = norm_images

    skeletons = []

    for name, img in images:

        road_mask = detect_road_like_mask(img)

        skel = skeletonize(road_mask)

        skeletons.append(skel)

        cv2.imwrite(str(folder / f"temp-road-{name}"), road_mask)

        cv2.imwrite(str(folder / f"temp-skel-{name}"), skel)

    valid_images = []

    valid_skeletons = []

    for (name, img), skel in zip(images, skeletons):

        if skel is None:

            print(f"[WARN] Skeleton for {name} is None — skipping")

            continue

        if skel.size == 0:

            print(f"[WARN] Skeleton for {name} is empty — skipping")

            continue

        if len(skel.shape) != 2:

            print(f"[WARN] Skeleton for {name} has invalid shape {skel.shape} — skipping")

            continue

        valid_images.append((name, img))

        valid_skeletons.append(skel)

    images = valid_images

    skeletons = valid_skeletons

    if len(skeletons) == 0:

        raise RuntimeError("All skeletons were invalid — nothing to process.")

    keep_mask = filter_redundant_frames(skeletons)

    images = [img for img, k in zip(images, keep_mask) if k]

    skeletons = [sk for sk, k in zip(skeletons, keep_mask) if k]

    scores = compute_pairwise_border_scores(skeletons)

    grid = solve_directional_grid(len(images), scores)

    mosaic = build_grid_mosaic(images, grid)

    cv2.imwrite(out_path, mosaic)

    return mosaic

if __name__ == "__main__":

    mosaic_street_grid(".", "street_grid_mosaic.jpg")


Thursday, May 14, 2026

 Drone Survey Area reconstitution:

Problem statement:

Aerial drone images extracted from a drone video are sufficient to reconstitute the survey area with image selection to create a mosaic that fully covers the survey area. This method does away with the knowledge of flight path of the drone. Write a python implementation that places selections from the input on the tiles in a grid to increase the likelihood of match with the overall survey area.

Solution:

The following is a visual survey approximation, not a georeferenced orthomosaic. Without GPS/EXIF or camera poses from the previous example, the script cannot know the true ground positions, so the grid is an informed montage rather than a mathematically correct map.

Usage:

pip install pyodm

docker run -p 3000:3000 opendronemap/nodeodm --test

Code:

#! /usr/bin/python

from pathlib import Path

import cv2

import numpy as np

import math

import shutil

import sys

def list_images(folder):

    exts = {".jpg", ".jpeg", ".JPG", ".JPEG"}

    files = [p for p in Path(folder).iterdir() if p.suffix in exts]

    return sorted(files, key=lambda p: p.name)

def make_detector():

    try:

        return cv2.SIFT_create()

    except Exception:

        return cv2.ORB_create(4000)

def detect(detector, img):

    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    return detector.detectAndCompute(gray, None)

def match_score(des1, des2, use_sift=True):

    if des1 is None or des2 is None:

        return 0

    if use_sift:

        matcher = cv2.FlannBasedMatcher(dict(algorithm=1, trees=5), dict(checks=40))

        matches = matcher.knnMatch(des1, des2, k=2)

    else:

        matcher = cv2.BFMatcher(cv2.NORM_HAMMING)

        matches = matcher.knnMatch(des1, des2, k=2)

    good = 0

    for pair in matches:

        if len(pair) < 2:

            continue

        m, n = pair

        if m.distance < 0.75 * n.distance:

            good += 1

    return good

def overlap_score(img1, img2, detector):

    kp1, des1 = detect(detector, img1)

    kp2, des2 = detect(detector, img2)

    use_sift = hasattr(cv2, "SIFT_create") and detector.__class__.__name__.lower().find("sift") >= 0

    return match_score(des1, des2, use_sift=use_sift)

def choose_grid(n, aspect=1.0):

    best = None

    for rows in range(1, n + 1):

        cols = math.ceil(n / rows)

        score = abs((cols / rows) - aspect)

        waste = rows * cols - n

        cand = (score, waste, abs(rows - cols), rows, cols)

        if best is None or cand < best:

            best = cand

    return best[3], best[4]

def fit_tile(img, tile_w, tile_h, pad=8, bg=(255, 255, 255)):

    h, w = img.shape[:2]

    scale = min((tile_w - 2 * pad) / w, (tile_h - 2 * pad) / h)

    nw, nh = max(1, int(round(w * scale))), max(1, int(round(h * scale)))

    resized = cv2.resize(img, (nw, nh), interpolation=cv2.INTER_AREA)

    canvas = np.full((tile_h, tile_w, 3), bg, dtype=np.uint8)

    x = (tile_w - nw) // 2

    y = (tile_h - nh) // 2

    canvas[y:y+nh, x:x+nw] = resized

    return canvas

def build_montage(folder, max_tiles=30, tile_w=360, tile_h=240, pad=8):

    folder = Path(folder).resolve()

    files = list_images(folder)

    if not files:

        raise ValueError("No JPG images found.")

    imgs = []

    for p in files:

        im = cv2.imread(str(p))

        if im is not None:

            imgs.append((p, im))

    if not imgs:

        raise ValueError("Could not read any images.")

    detector = make_detector()

    n = min(len(imgs), max_tiles)

    used = imgs[:n]

    scores = np.zeros((n, n), dtype=int)

    for i in range(n):

        for j in range(i + 1, n):

            s = overlap_score(used[i][1], used[j][1], detector)

            scores[i, j] = scores[j, i] = s

    remaining = set(range(1, n))

    order = [0]

    while remaining:

        last = order[-1]

        nxt = max(remaining, key=lambda j: (scores[last, j], -j))

        order.append(nxt)

        remaining.remove(nxt)

    rows, cols = choose_grid(n, aspect=1.0)

    while len(order) < rows * cols:

        order.append(None)

    montage = np.full((rows * tile_h, cols * tile_w, 3), 255, dtype=np.uint8)

    for idx in range(rows * cols):

        r = idx // cols

        c = idx % cols

        x0, y0 = c * tile_w, r * tile_h

        cv2.rectangle(montage, (x0, y0), (x0 + tile_w - 1, y0 + tile_h - 1), (230, 230, 230), 1)

        item_idx = order[idx]

        if item_idx is None:

            continue

        p, img = used[item_idx]

        tile = fit_tile(img, tile_w, tile_h, pad=pad)

        montage[y0:y0 + tile_h, x0:x0 + tile_w] = tile

        label = p.stem[:34]

        cv2.putText(

            montage,

            label,

            (x0 + 10, y0 + tile_h - 12),

            cv2.FONT_HERSHEY_SIMPLEX,

            0.5,

            (20, 20, 20),

            1,

            cv2.LINE_AA,

        )

    out_dir = folder / "montage_output"

    out_dir.mkdir(exist_ok=True)

    out_path = out_dir / f"{folder.name}_grid_montage.png"

    cv2.imwrite(str(out_path), montage)

    same_folder_copy = folder / out_path.name

    shutil.copy2(out_path, same_folder_copy)

    return str(same_folder_copy)

if __name__ == "__main__":

    if len(sys.argv) < 2:

        print("Usage: python grid_montage.py /path/to/folder")

        sys.exit(1)

    print(build_montage(sys.argv[1]))


Wednesday, May 13, 2026

 Drone Survey Area reconstitution:

Problem statement:

Aerial drone images extracted from a drone video are sufficient to reconstitute the survey area with image selection to create a mosaic that fully covers the survey area. This method does away with the knowledge of flight path of the drone. Write a python implementation that places selections from the input on the tiles in a grid to increase the likelihood of match with the overall survey area.

Solution:

The following implementation assumes that the images have GPS/EXIF metadata and leverages OpenDroneMap to create a mosaic.

Usage:

pip install pyodm

docker run -p 3000:3000 opendronemap/nodeodm --test

Code:

#! /usr/bin/python

from pathlib import Path

import shutil

import sys

from pyodm import Node, exceptions

def find_images(input_folder: Path):

    exts = {".jpg", ".jpeg", ".JPG", ".JPEG"}

    images = sorted([str(p) for p in input_folder.iterdir() if p.suffix in exts])

    return images

def pick_orthomosaic_file(results_dir: Path):

    candidates = []

    for ext in ("*.tif", "*.tiff", "*.png", "*.jpg", "*.jpeg"):

        candidates.extend(results_dir.rglob(ext))

    preferred = []

    for p in candidates:

        s = str(p).lower()

        if "orthophoto" in s or "orthomosaic" in s or "odm_orthophoto" in s:

            preferred.append(p)

    if preferred:

        preferred.sort(key=lambda p: (0 if p.suffix.lower() in [".tif", ".tiff"] else 1, len(str(p))))

        return preferred[0]

    if candidates:

        candidates.sort(key=lambda p: (0 if p.suffix.lower() in [".tif", ".tiff"] else 1, len(str(p))))

        return candidates[0]

    return None

def reconstruct_mosaic(input_folder: str, node_url="localhost", node_port=3000):

    input_path = Path(input_folder).resolve()

    if not input_path.exists() or not input_path.is_dir():

        raise FileNotFoundError(f"Folder not found: {input_path}")

    images = find_images(input_path)

    if len(images) < 3:

        raise ValueError("Need at least 3 overlapping drone images for a meaningful mosaic.")

    output_dir = input_path / "odm_results"

    output_dir.mkdir(parents=True, exist_ok=True)

    node = Node(node_url, port=node_port)

    print(node.info())

    options = {

        "auto-boundary": True,

        "crop": 0,

        "fast-orthophoto": True,

        "skip-post-processing": False,

        "orthophoto-resolution": 5,

        "use-exif": True,

        "optimize-disk-space": True,

    }

    try:

        task = node.create_task(images, options)

        print("Task created:", task.info().task_id)

        task.wait_for_completion()

        task.download_assets(str(output_dir))

        orthomosaic = pick_orthomosaic_file(output_dir)

        if orthomosaic is None:

            raise FileNotFoundError("No orthomosaic file was produced by ODM.")

        final_name = input_path / f"{input_path.name}_orthomosaic{orthomosaic.suffix.lower()}"

        shutil.copy2(orthomosaic, final_name)

        print(f"Orthomosaic saved to: {final_name}")

        return str(final_name)

    except exceptions.NodeConnectionError as e:

        raise RuntimeError(f"Cannot connect to NodeODM at {node_url}:{node_port}. Error: {e}")

    except exceptions.TaskFailedError as e:

        raise RuntimeError(f"ODM task failed: {e}")

if __name__ == "__main__":

    if len(sys.argv) < 2:

        print("Usage: python odm_mosaic.py /path/to/drone_images")

        sys.exit(1)

    reconstruct_mosaic(sys.argv[1])

References: compare to previous article: 

Tuesday, May 12, 2026

 Drone Survey Area reconstitution:

Problem statement:

Aerial drone images extracted from a drone video are sufficient to reconstitute the survey area with image selection to create a mosaic that fully covers the survey area. This method does away with the knowledge of flight path of the drone. Write a python implementation that places selections from the input on the tiles in a grid to increase the likelihood of match with the overall survey area.

Solution:

The following implementation uses overlap between consecutive frames to estimate a 2D motion vector (how the drone moved between frame i and i+1), integrates those motions along the timeline to get approximate 2D positions for each frame, rotates and normalizes those positions so the path becomes a clean rectangle-ish footprint, snaps those positions to a 2D grid (with possible collisions—some frames can land in the same cell), builds a mosaic image where the layout reflects the actual flight path much more than just “visual similarity clustering”.

Code:

#! /usr/bin/python

import os

import math

import cv2

import numpy as np

from typing import List, Tuple

# ---------------------------------------------------------

# 1. Load and preprocess images (sorted by filename)

# ---------------------------------------------------------

def load_images_sorted(folder: str,

                       max_images: int = None,

                       target_size: Tuple[int, int] = (512, 512)) -> List[np.ndarray]:

    files = sorted(os.listdir(folder))

    imgs = []

    for fname in files:

        path = os.path.join(folder, fname)

        if not os.path.isfile(path):

            continue

        img = cv2.imread(path, cv2.IMREAD_COLOR)

        if img is None:

            continue

        img = cv2.resize(img, target_size, interpolation=cv2.INTER_AREA)

        imgs.append(img)

        if max_images is not None and len(imgs) >= max_images:

            break

    if not imgs:

        raise ValueError("No valid images found in folder")

    return imgs

# ---------------------------------------------------------

# 2. Estimate translation between consecutive frames

# using phase correlation (overlap-based)

# ---------------------------------------------------------

def estimate_translation(img1: np.ndarray, img2: np.ndarray) -> np.ndarray:

    """

    Estimate 2D translation from img1 to img2 using phase correlation.

    Returns a 2D vector (dx, dy) in pixels.

    """

    # Convert to grayscale float32

    g1 = cv2.cvtColor(img1, cv2.COLOR_BGR2GRAY).astype(np.float32)

    g2 = cv2.cvtColor(img2, cv2.COLOR_BGR2GRAY).astype(np.float32)

    # Optional: apply Hanning window to reduce edge effects

    h, w = g1.shape

    win = cv2.createHanningWindow((w, h), cv2.CV_32F)

    g1w = g1 * win

    g2w = g2 * win

    shift, response = cv2.phaseCorrelate(g1w, g2w)

    dx, dy = shift # note: phaseCorrelate returns (dx, dy)

    return np.array([dx, dy], dtype=np.float32)

def accumulate_positions(images: List[np.ndarray]) -> np.ndarray:

    """

    For a sequence of images, estimate relative translations and

    integrate them to get approximate 2D positions.

    """

    N = len(images)

    positions = np.zeros((N, 2), dtype=np.float32)

    for i in range(N - 1):

        delta = estimate_translation(images[i], images[i + 1])

        # We accumulate the *negative* of the shift because phaseCorrelate

        # tells us how to move img2 to align with img1.

        positions[i + 1] = positions[i] - delta

    return positions # shape (N, 2)

# ---------------------------------------------------------

# 3. Normalize and straighten the path (PCA)

# ---------------------------------------------------------

def normalize_positions(positions: np.ndarray) -> np.ndarray:

    """

    Center, rotate (PCA), and scale positions into [0,1]x[0,1].

    """

    # Center

    mean = positions.mean(axis=0)

    X = positions - mean

    # PCA for rotation

    cov = np.cov(X.T)

    eigvals, eigvecs = np.linalg.eigh(cov)

    # Sort eigenvectors by descending eigenvalue

    order = np.argsort(eigvals)[::-1]

    R = eigvecs[:, order]

    X_rot = X @ R # rotate

    # Normalize to [0,1]

    min_xy = X_rot.min(axis=0)

    max_xy = X_rot.max(axis=0)

    span = np.maximum(max_xy - min_xy, 1e-6)

    X_norm = (X_rot - min_xy) / span

    return X_norm # shape (N, 2), in [0,1]

# ---------------------------------------------------------

# 4. Snap positions to a grid

# ---------------------------------------------------------

def choose_grid_shape(N: int) -> Tuple[int, int]:

    """

    Choose a roughly rectangular grid for N images.

    """

    rows = int(math.floor(math.sqrt(N)))

    cols = int(math.ceil(N / rows))

    if rows * cols < N:

        cols += 1

    return rows, cols

def snap_to_grid(pos_norm: np.ndarray,

                 grid_rows: int,

                 grid_cols: int) -> List[Tuple[int, int]]:

    """

    Map normalized positions in [0,1]^2 to integer grid cells.

    Multiple images can land in the same cell; that's allowed.

    """

    N = pos_norm.shape[0]

    assignments = []

    for i in range(N):

        x, y = pos_norm[i]

        # x -> col, y -> row

        c = int(np.clip(x * grid_cols, 0, grid_cols - 1))

        r = int(np.clip(y * grid_rows, 0, grid_rows - 1))

        assignments.append((r, c))

    return assignments

# ---------------------------------------------------------

# 5. Build a mosaic for visualization

# ---------------------------------------------------------

def build_mosaic(images: List[np.ndarray],

                 assignments: List[Tuple[int, int]],

                 grid_rows: int,

                 grid_cols: int,

                 tile_size: Tuple[int, int] = (256, 256)) -> np.ndarray:

    """

    Visual mosaic: each grid cell shows the *last* image assigned to it.

    (You can change this to average or small multiples if you want.)

    """

    tile_w, tile_h = tile_size

    mosaic_h = grid_rows * tile_h

    mosaic_w = grid_cols * tile_w

    mosaic = np.zeros((mosaic_h, mosaic_w, 3), dtype=np.uint8)

    for img, (r, c) in zip(images, assignments):

        tile = cv2.resize(img, (tile_w, tile_h), interpolation=cv2.INTER_AREA)

        y0 = r * tile_h

        x0 = c * tile_w

        mosaic[y0:y0+tile_h, x0:x0+tile_w, :] = tile

    return mosaic

# ---------------------------------------------------------

# 6. High-level function

# ---------------------------------------------------------

def layout_drone_tour_by_overlap(folder: str,

                                 max_images: int = None,

                                 base_size: Tuple[int, int] = (512, 512)) -> np.ndarray:

    """

    1) Load sequential frames from folder.

    2) Estimate frame-to-frame translations via phase correlation.

    3) Integrate to get 2D positions along the flight path.

    4) Straighten and normalize the path with PCA.

    5) Snap to a grid and build a mosaic.

    """

    images = load_images_sorted(folder, max_images=max_images, target_size=base_size)

    positions = accumulate_positions(images)

    pos_norm = normalize_positions(positions)

    grid_rows, grid_cols = choose_grid_shape(len(images))

    print(f"Grid shape: {grid_rows} x {grid_cols}")

    assignments = snap_to_grid(pos_norm, grid_rows, grid_cols)

    mosaic = build_mosaic(images, assignments, grid_rows, grid_cols,

                          tile_size=(256, 256))

    return mosaic

if __name__ == "__main__":

    # Requirements:

    # pip install opencv-python numpy

    folder = "."

    mosaic = layout_drone_tour_by_overlap(folder, max_images=None)

    cv2.imwrite("drone_path_layout.png", mosaic)

    print("Saved drone_path_layout.png")