Saturday, May 16, 2026

 The current phase of the AI agent economy is defined by a tension between undeniable productivity gains and uneven monetization, a pattern made clear in recent industry reviews. Across tens of thousands of surveyed users, the strongest signal is that AI is already expanding the amount and type of work individuals can complete. Users report “substantially more productive” outcomes, with 48 percent citing expanded scope of work and 40 percent citing faster execution . These gains are real, measurable, and broadly distributed, yet they do not automatically translate into durable revenue for the companies building these systems. The market is now shifting from hype-driven visibility to a more sober evaluation of where AI actually changes operating leverage.

Commercial traction is emerging most clearly in enterprise environments where workflows are frequent, outcomes are quantifiable, and cost structures are well understood. Customer support illustrates this dynamic: organizations with high ticket volumes and predictable service metrics can immediately measure the impact of automation on cost per interaction. Even modest deflection rates of 20 to 50 percent materially improve margins at scale, making support automation one of the earliest reliable revenue categories. Similar logic applies to sales and revenue operations, where AI agents that automate CRM updates, summarize calls, or draft follow‑ups increase productive selling hours without increasing headcount. In engineering and internal operations, the value proposition is even more direct because skilled labor is expensive and capacity constrained. Tools that reduce debugging time or accelerate documentation by even 20 to 40 percent can outperform many back‑office use cases despite smaller user counts.

The reviews emphasize that Southeast Asia’s SME landscape may represent an underappreciated opportunity. Small and medium enterprises in the region often operate with lean teams and fragmented systems, making AI agents for invoicing, scheduling, multilingual messaging, and collections immediately valuable. These are environments where owner‑level productivity gains translate directly into willingness to pay. The broader pattern is consistent: enterprises pay for AI when it improves labor efficiency, shortens cycles, or generates measurable operating returns.

At the same time, the labor implications are complex. Productivity gains do not necessarily reduce anxiety about job security. The survey shows that roughly one‑fifth of respondents fear displacement, with early‑career workers expressing the highest concern. One article cites that “users who reported the largest speed gains… were also among the most concerned about job loss” . This creates a two‑speed labor market in which junior and repetitive tasks are automated first, potentially compressing the traditional pipeline through which future managers and specialists develop. The next phase of value creation may therefore come not from replacing workers but from enabling one skilled employee to manage the output of multiple AI systems.

Where hype outpaces revenue, the pattern is equally clear. Consumer‑facing general agents attract attention and experimentation, but retention is inconsistent and pricing power is weak. As foundation models improve, standalone wrappers with limited differentiation face increasing pressure. Products with high inference costs but low willingness to pay may show strong usage while generating weak margins. The market increasingly rewards repeat usage, clear ROI, and defensible workflow integration rather than viral adoption.

From an investor perspective, the next winners may appear less glamorous but more economically durable. Metrics such as fast payback periods, high usage frequency, low churn, expansion revenue, proprietary data loops, and strong margins are the most reliable signals of long‑term value. Products embedded deeply into CRM, ERP, ticketing, finance, or operational systems create switching costs that general assistants cannot match. Vertical AI in healthcare administration, legal review, finance operations, logistics, and industrial workflows may therefore outperform broader consumer‑oriented tools.

This reinforces that the majority of AI’s current surplus accrues to individuals rather than institutions. Around 70 percent of respondents say the primary beneficiary of AI productivity is “me,” while only about 10 percent point to employers or clients . This suggests that adoption is still user‑led rather than enterprise‑captured. Historically, technologies such as search, social platforms, and cloud software followed similar trajectories: utility emerged first, monetization matured later. The next stage of the AI agent economy will depend on converting personal productivity gains into enterprise budgets through workflow integration, measurable outcomes, and recurring value.


Friday, May 15, 2026

 Drone Survey Area reconstitution:

Problem statement:

Aerial drone images extracted from a drone video are sufficient to reconstitute the survey area with image selection to create a mosaic that fully covers the survey area. This method does away with the knowledge of flight path of the drone. Write a python implementation that places selections from the input on the tiles in a grid to increase the likelihood of match with the overall survey area.

Solution:

The following is a visual survey approximation, not a georeferenced orthomosaic. Without GPS/EXIF or camera poses from the previous example, the script cannot know the true ground positions, so the grid is an informed montage rather than a mathematically correct map.

Usage:

pip install pyodm

docker run -p 3000:3000 opendronemap/nodeodm --test

Code:

#! /usr/bin/python

import cv2

import numpy as np

from pathlib import Path

import math

def detect_road_like_mask(img):

    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    gray = cv2.GaussianBlur(gray, (5, 5), 0)

    edges = cv2.Canny(gray, 40, 120)

    kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (7, 7))

    closed = cv2.morphologyEx(edges, cv2.MORPH_CLOSE, kernel, iterations=2)

    dilated = cv2.dilate(closed, kernel, iterations=1)

    return (dilated > 0).astype(np.uint8) * 255

def skeletonize(mask):

    mask = (mask > 0).astype(np.uint8)

    skel = np.zeros_like(mask)

    element = cv2.getStructuringElement(cv2.MORPH_CROSS, (3, 3))

    temp = mask.copy()

    while True:

        eroded = cv2.erode(temp, element)

        opened = cv2.dilate(eroded, element)

        temp2 = cv2.subtract(temp, opened)

        skel = cv2.bitwise_or(skel, temp2)

        temp = eroded.copy()

        if cv2.countNonZero(temp) == 0:

            break

    return skel

def border_signature(skel):

    h, w = skel.shape

    return (

        skel[0, :], # top

        skel[-1, :], # bottom

        skel[:, 0], # left

        skel[:, -1], # right

    )

def border_similarity(a, b):

    if a.shape != b.shape:

        return 0

    return np.sum((a > 0) & (b > 0))

def compute_pairwise_border_scores(skeletons):

    N = len(skeletons)

    borders = [border_signature(s) for s in skeletons]

    scores = {}

    for i in range(N):

        for j in range(N):

            if i == j:

                continue

            scores[(i, j)] = {

                "up": border_similarity(borders[i][0], borders[j][1]),

                "down": border_similarity(borders[i][1], borders[j][0]),

                "left": border_similarity(borders[i][2], borders[j][3]),

                "right": border_similarity(borders[i][3], borders[j][2]),

            }

    return scores

def filter_redundant_frames(skeletons, overlap_threshold=0.75):

    N = len(skeletons)

    keep = [True] * N

    for i in range(N):

        if not keep[i]:

            continue

        si = skeletons[i]

        if si is None or si.size == 0:

            keep[i] = False

            continue

        si = si > 0

        for j in range(i + 1, N):

            if not keep[j]:

                continue

            sj = skeletons[j]

            if sj is None or sj.size == 0:

                keep[j] = False

                continue

            sj = sj > 0

            inter = np.sum(si & sj)

            union = np.sum(si | sj)

            if union == 0:

                continue

            iou = inter / union

            if iou > overlap_threshold:

                keep[j] = False

    return keep

def solve_directional_grid(N, scores, min_adj_score=20, direction_bias=1.5):

    G = int(math.ceil(math.sqrt(N)))

    grid = [[None for _ in range(G)] for _ in range(G)]

    used = set()

    grid[0][0] = 0

    used.add(0)

    for r in range(G):

        for c in range(G):

            if r == 0 and c == 0:

                continue

            best_tile = None

            best_score = -1

            for t in range(N):

                if t in used:

                    continue

                score = 0

                if r > 0 and grid[r - 1][c] is not None:

                    above = grid[r - 1][c]

                    vertical_score = scores.get((above, t), {}).get("down", 0)

                    score += vertical_score * direction_bias

                if c > 0 and grid[r][c - 1] is not None:

                    left = grid[r][c - 1]

                    horizontal_score = scores.get((left, t), {}).get("right", 0)

                    score += horizontal_score * direction_bias

                if score > best_score:

                    best_score = score

                    best_tile = t

            if best_score < min_adj_score:

                grid[r][c] = None

            else:

                grid[r][c] = best_tile

                used.add(best_tile)

            if len(used) == N:

                return grid

    return grid

def build_grid_mosaic(images, grid):

    H, W = images[0][1].shape[:2]

    G = len(grid)

    canvas = np.zeros((G * H, G * W, 3), dtype=np.uint8)

    for r in range(G):

        for c in range(G):

            idx = grid[r][c]

            if idx is None:

                continue

            name, img = images[idx]

            y0, y1 = r * H, (r + 1) * H

            x0, x1 = c * W, (c + 1) * W

            canvas[y0:y1, x0:x1] = img

    return canvas

def mosaic_street_grid(folder, out_path="grid_mosaic.jpg"):

    folder = Path(folder)

    images = []

    for p in sorted(folder.iterdir()):

        if p.suffix.lower() in [".jpg", ".jpeg", ".png"]:

            img = cv2.imread(str(p))

            images.append((p.name, img))

    if not images:

        raise RuntimeError("No images found")

    # normalize all images to the size of the first one

    base_h, base_w = images[0][1].shape[:2]

    norm_images = []

    for name, img in images:

        h, w = img.shape[:2]

        if (h, w) != (base_h, base_w):

            img = cv2.resize(img, (base_w, base_h), interpolation=cv2.INTER_AREA)

        norm_images.append((name, img))

    images = norm_images

    skeletons = []

    for name, img in images:

        road_mask = detect_road_like_mask(img)

        skel = skeletonize(road_mask)

        skeletons.append(skel)

        cv2.imwrite(str(folder / f"temp-road-{name}"), road_mask)

        cv2.imwrite(str(folder / f"temp-skel-{name}"), skel)

    valid_images = []

    valid_skeletons = []

    for (name, img), skel in zip(images, skeletons):

        if skel is None:

            print(f"[WARN] Skeleton for {name} is None — skipping")

            continue

        if skel.size == 0:

            print(f"[WARN] Skeleton for {name} is empty — skipping")

            continue

        if len(skel.shape) != 2:

            print(f"[WARN] Skeleton for {name} has invalid shape {skel.shape} — skipping")

            continue

        valid_images.append((name, img))

        valid_skeletons.append(skel)

    images = valid_images

    skeletons = valid_skeletons

    if len(skeletons) == 0:

        raise RuntimeError("All skeletons were invalid — nothing to process.")

    keep_mask = filter_redundant_frames(skeletons)

    images = [img for img, k in zip(images, keep_mask) if k]

    skeletons = [sk for sk, k in zip(skeletons, keep_mask) if k]

    scores = compute_pairwise_border_scores(skeletons)

    grid = solve_directional_grid(len(images), scores)

    mosaic = build_grid_mosaic(images, grid)

    cv2.imwrite(out_path, mosaic)

    return mosaic

if __name__ == "__main__":

    mosaic_street_grid(".", "street_grid_mosaic.jpg")


Thursday, May 14, 2026

 Drone Survey Area reconstitution:

Problem statement:

Aerial drone images extracted from a drone video are sufficient to reconstitute the survey area with image selection to create a mosaic that fully covers the survey area. This method does away with the knowledge of flight path of the drone. Write a python implementation that places selections from the input on the tiles in a grid to increase the likelihood of match with the overall survey area.

Solution:

The following is a visual survey approximation, not a georeferenced orthomosaic. Without GPS/EXIF or camera poses from the previous example, the script cannot know the true ground positions, so the grid is an informed montage rather than a mathematically correct map.

Usage:

pip install pyodm

docker run -p 3000:3000 opendronemap/nodeodm --test

Code:

#! /usr/bin/python

from pathlib import Path

import cv2

import numpy as np

import math

import shutil

import sys

def list_images(folder):

    exts = {".jpg", ".jpeg", ".JPG", ".JPEG"}

    files = [p for p in Path(folder).iterdir() if p.suffix in exts]

    return sorted(files, key=lambda p: p.name)

def make_detector():

    try:

        return cv2.SIFT_create()

    except Exception:

        return cv2.ORB_create(4000)

def detect(detector, img):

    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    return detector.detectAndCompute(gray, None)

def match_score(des1, des2, use_sift=True):

    if des1 is None or des2 is None:

        return 0

    if use_sift:

        matcher = cv2.FlannBasedMatcher(dict(algorithm=1, trees=5), dict(checks=40))

        matches = matcher.knnMatch(des1, des2, k=2)

    else:

        matcher = cv2.BFMatcher(cv2.NORM_HAMMING)

        matches = matcher.knnMatch(des1, des2, k=2)

    good = 0

    for pair in matches:

        if len(pair) < 2:

            continue

        m, n = pair

        if m.distance < 0.75 * n.distance:

            good += 1

    return good

def overlap_score(img1, img2, detector):

    kp1, des1 = detect(detector, img1)

    kp2, des2 = detect(detector, img2)

    use_sift = hasattr(cv2, "SIFT_create") and detector.__class__.__name__.lower().find("sift") >= 0

    return match_score(des1, des2, use_sift=use_sift)

def choose_grid(n, aspect=1.0):

    best = None

    for rows in range(1, n + 1):

        cols = math.ceil(n / rows)

        score = abs((cols / rows) - aspect)

        waste = rows * cols - n

        cand = (score, waste, abs(rows - cols), rows, cols)

        if best is None or cand < best:

            best = cand

    return best[3], best[4]

def fit_tile(img, tile_w, tile_h, pad=8, bg=(255, 255, 255)):

    h, w = img.shape[:2]

    scale = min((tile_w - 2 * pad) / w, (tile_h - 2 * pad) / h)

    nw, nh = max(1, int(round(w * scale))), max(1, int(round(h * scale)))

    resized = cv2.resize(img, (nw, nh), interpolation=cv2.INTER_AREA)

    canvas = np.full((tile_h, tile_w, 3), bg, dtype=np.uint8)

    x = (tile_w - nw) // 2

    y = (tile_h - nh) // 2

    canvas[y:y+nh, x:x+nw] = resized

    return canvas

def build_montage(folder, max_tiles=30, tile_w=360, tile_h=240, pad=8):

    folder = Path(folder).resolve()

    files = list_images(folder)

    if not files:

        raise ValueError("No JPG images found.")

    imgs = []

    for p in files:

        im = cv2.imread(str(p))

        if im is not None:

            imgs.append((p, im))

    if not imgs:

        raise ValueError("Could not read any images.")

    detector = make_detector()

    n = min(len(imgs), max_tiles)

    used = imgs[:n]

    scores = np.zeros((n, n), dtype=int)

    for i in range(n):

        for j in range(i + 1, n):

            s = overlap_score(used[i][1], used[j][1], detector)

            scores[i, j] = scores[j, i] = s

    remaining = set(range(1, n))

    order = [0]

    while remaining:

        last = order[-1]

        nxt = max(remaining, key=lambda j: (scores[last, j], -j))

        order.append(nxt)

        remaining.remove(nxt)

    rows, cols = choose_grid(n, aspect=1.0)

    while len(order) < rows * cols:

        order.append(None)

    montage = np.full((rows * tile_h, cols * tile_w, 3), 255, dtype=np.uint8)

    for idx in range(rows * cols):

        r = idx // cols

        c = idx % cols

        x0, y0 = c * tile_w, r * tile_h

        cv2.rectangle(montage, (x0, y0), (x0 + tile_w - 1, y0 + tile_h - 1), (230, 230, 230), 1)

        item_idx = order[idx]

        if item_idx is None:

            continue

        p, img = used[item_idx]

        tile = fit_tile(img, tile_w, tile_h, pad=pad)

        montage[y0:y0 + tile_h, x0:x0 + tile_w] = tile

        label = p.stem[:34]

        cv2.putText(

            montage,

            label,

            (x0 + 10, y0 + tile_h - 12),

            cv2.FONT_HERSHEY_SIMPLEX,

            0.5,

            (20, 20, 20),

            1,

            cv2.LINE_AA,

        )

    out_dir = folder / "montage_output"

    out_dir.mkdir(exist_ok=True)

    out_path = out_dir / f"{folder.name}_grid_montage.png"

    cv2.imwrite(str(out_path), montage)

    same_folder_copy = folder / out_path.name

    shutil.copy2(out_path, same_folder_copy)

    return str(same_folder_copy)

if __name__ == "__main__":

    if len(sys.argv) < 2:

        print("Usage: python grid_montage.py /path/to/folder")

        sys.exit(1)

    print(build_montage(sys.argv[1]))


Wednesday, May 13, 2026

 Drone Survey Area reconstitution:

Problem statement:

Aerial drone images extracted from a drone video are sufficient to reconstitute the survey area with image selection to create a mosaic that fully covers the survey area. This method does away with the knowledge of flight path of the drone. Write a python implementation that places selections from the input on the tiles in a grid to increase the likelihood of match with the overall survey area.

Solution:

The following implementation assumes that the images have GPS/EXIF metadata and leverages OpenDroneMap to create a mosaic.

Usage:

pip install pyodm

docker run -p 3000:3000 opendronemap/nodeodm --test

Code:

#! /usr/bin/python

from pathlib import Path

import shutil

import sys

from pyodm import Node, exceptions

def find_images(input_folder: Path):

    exts = {".jpg", ".jpeg", ".JPG", ".JPEG"}

    images = sorted([str(p) for p in input_folder.iterdir() if p.suffix in exts])

    return images

def pick_orthomosaic_file(results_dir: Path):

    candidates = []

    for ext in ("*.tif", "*.tiff", "*.png", "*.jpg", "*.jpeg"):

        candidates.extend(results_dir.rglob(ext))

    preferred = []

    for p in candidates:

        s = str(p).lower()

        if "orthophoto" in s or "orthomosaic" in s or "odm_orthophoto" in s:

            preferred.append(p)

    if preferred:

        preferred.sort(key=lambda p: (0 if p.suffix.lower() in [".tif", ".tiff"] else 1, len(str(p))))

        return preferred[0]

    if candidates:

        candidates.sort(key=lambda p: (0 if p.suffix.lower() in [".tif", ".tiff"] else 1, len(str(p))))

        return candidates[0]

    return None

def reconstruct_mosaic(input_folder: str, node_url="localhost", node_port=3000):

    input_path = Path(input_folder).resolve()

    if not input_path.exists() or not input_path.is_dir():

        raise FileNotFoundError(f"Folder not found: {input_path}")

    images = find_images(input_path)

    if len(images) < 3:

        raise ValueError("Need at least 3 overlapping drone images for a meaningful mosaic.")

    output_dir = input_path / "odm_results"

    output_dir.mkdir(parents=True, exist_ok=True)

    node = Node(node_url, port=node_port)

    print(node.info())

    options = {

        "auto-boundary": True,

        "crop": 0,

        "fast-orthophoto": True,

        "skip-post-processing": False,

        "orthophoto-resolution": 5,

        "use-exif": True,

        "optimize-disk-space": True,

    }

    try:

        task = node.create_task(images, options)

        print("Task created:", task.info().task_id)

        task.wait_for_completion()

        task.download_assets(str(output_dir))

        orthomosaic = pick_orthomosaic_file(output_dir)

        if orthomosaic is None:

            raise FileNotFoundError("No orthomosaic file was produced by ODM.")

        final_name = input_path / f"{input_path.name}_orthomosaic{orthomosaic.suffix.lower()}"

        shutil.copy2(orthomosaic, final_name)

        print(f"Orthomosaic saved to: {final_name}")

        return str(final_name)

    except exceptions.NodeConnectionError as e:

        raise RuntimeError(f"Cannot connect to NodeODM at {node_url}:{node_port}. Error: {e}")

    except exceptions.TaskFailedError as e:

        raise RuntimeError(f"ODM task failed: {e}")

if __name__ == "__main__":

    if len(sys.argv) < 2:

        print("Usage: python odm_mosaic.py /path/to/drone_images")

        sys.exit(1)

    reconstruct_mosaic(sys.argv[1])

References: compare to previous article: 

Tuesday, May 12, 2026

 Drone Survey Area reconstitution:

Problem statement:

Aerial drone images extracted from a drone video are sufficient to reconstitute the survey area with image selection to create a mosaic that fully covers the survey area. This method does away with the knowledge of flight path of the drone. Write a python implementation that places selections from the input on the tiles in a grid to increase the likelihood of match with the overall survey area.

Solution:

The following implementation uses overlap between consecutive frames to estimate a 2D motion vector (how the drone moved between frame i and i+1), integrates those motions along the timeline to get approximate 2D positions for each frame, rotates and normalizes those positions so the path becomes a clean rectangle-ish footprint, snaps those positions to a 2D grid (with possible collisions—some frames can land in the same cell), builds a mosaic image where the layout reflects the actual flight path much more than just “visual similarity clustering”.

Code:

#! /usr/bin/python

import os

import math

import cv2

import numpy as np

from typing import List, Tuple

# ---------------------------------------------------------

# 1. Load and preprocess images (sorted by filename)

# ---------------------------------------------------------

def load_images_sorted(folder: str,

                       max_images: int = None,

                       target_size: Tuple[int, int] = (512, 512)) -> List[np.ndarray]:

    files = sorted(os.listdir(folder))

    imgs = []

    for fname in files:

        path = os.path.join(folder, fname)

        if not os.path.isfile(path):

            continue

        img = cv2.imread(path, cv2.IMREAD_COLOR)

        if img is None:

            continue

        img = cv2.resize(img, target_size, interpolation=cv2.INTER_AREA)

        imgs.append(img)

        if max_images is not None and len(imgs) >= max_images:

            break

    if not imgs:

        raise ValueError("No valid images found in folder")

    return imgs

# ---------------------------------------------------------

# 2. Estimate translation between consecutive frames

# using phase correlation (overlap-based)

# ---------------------------------------------------------

def estimate_translation(img1: np.ndarray, img2: np.ndarray) -> np.ndarray:

    """

    Estimate 2D translation from img1 to img2 using phase correlation.

    Returns a 2D vector (dx, dy) in pixels.

    """

    # Convert to grayscale float32

    g1 = cv2.cvtColor(img1, cv2.COLOR_BGR2GRAY).astype(np.float32)

    g2 = cv2.cvtColor(img2, cv2.COLOR_BGR2GRAY).astype(np.float32)

    # Optional: apply Hanning window to reduce edge effects

    h, w = g1.shape

    win = cv2.createHanningWindow((w, h), cv2.CV_32F)

    g1w = g1 * win

    g2w = g2 * win

    shift, response = cv2.phaseCorrelate(g1w, g2w)

    dx, dy = shift # note: phaseCorrelate returns (dx, dy)

    return np.array([dx, dy], dtype=np.float32)

def accumulate_positions(images: List[np.ndarray]) -> np.ndarray:

    """

    For a sequence of images, estimate relative translations and

    integrate them to get approximate 2D positions.

    """

    N = len(images)

    positions = np.zeros((N, 2), dtype=np.float32)

    for i in range(N - 1):

        delta = estimate_translation(images[i], images[i + 1])

        # We accumulate the *negative* of the shift because phaseCorrelate

        # tells us how to move img2 to align with img1.

        positions[i + 1] = positions[i] - delta

    return positions # shape (N, 2)

# ---------------------------------------------------------

# 3. Normalize and straighten the path (PCA)

# ---------------------------------------------------------

def normalize_positions(positions: np.ndarray) -> np.ndarray:

    """

    Center, rotate (PCA), and scale positions into [0,1]x[0,1].

    """

    # Center

    mean = positions.mean(axis=0)

    X = positions - mean

    # PCA for rotation

    cov = np.cov(X.T)

    eigvals, eigvecs = np.linalg.eigh(cov)

    # Sort eigenvectors by descending eigenvalue

    order = np.argsort(eigvals)[::-1]

    R = eigvecs[:, order]

    X_rot = X @ R # rotate

    # Normalize to [0,1]

    min_xy = X_rot.min(axis=0)

    max_xy = X_rot.max(axis=0)

    span = np.maximum(max_xy - min_xy, 1e-6)

    X_norm = (X_rot - min_xy) / span

    return X_norm # shape (N, 2), in [0,1]

# ---------------------------------------------------------

# 4. Snap positions to a grid

# ---------------------------------------------------------

def choose_grid_shape(N: int) -> Tuple[int, int]:

    """

    Choose a roughly rectangular grid for N images.

    """

    rows = int(math.floor(math.sqrt(N)))

    cols = int(math.ceil(N / rows))

    if rows * cols < N:

        cols += 1

    return rows, cols

def snap_to_grid(pos_norm: np.ndarray,

                 grid_rows: int,

                 grid_cols: int) -> List[Tuple[int, int]]:

    """

    Map normalized positions in [0,1]^2 to integer grid cells.

    Multiple images can land in the same cell; that's allowed.

    """

    N = pos_norm.shape[0]

    assignments = []

    for i in range(N):

        x, y = pos_norm[i]

        # x -> col, y -> row

        c = int(np.clip(x * grid_cols, 0, grid_cols - 1))

        r = int(np.clip(y * grid_rows, 0, grid_rows - 1))

        assignments.append((r, c))

    return assignments

# ---------------------------------------------------------

# 5. Build a mosaic for visualization

# ---------------------------------------------------------

def build_mosaic(images: List[np.ndarray],

                 assignments: List[Tuple[int, int]],

                 grid_rows: int,

                 grid_cols: int,

                 tile_size: Tuple[int, int] = (256, 256)) -> np.ndarray:

    """

    Visual mosaic: each grid cell shows the *last* image assigned to it.

    (You can change this to average or small multiples if you want.)

    """

    tile_w, tile_h = tile_size

    mosaic_h = grid_rows * tile_h

    mosaic_w = grid_cols * tile_w

    mosaic = np.zeros((mosaic_h, mosaic_w, 3), dtype=np.uint8)

    for img, (r, c) in zip(images, assignments):

        tile = cv2.resize(img, (tile_w, tile_h), interpolation=cv2.INTER_AREA)

        y0 = r * tile_h

        x0 = c * tile_w

        mosaic[y0:y0+tile_h, x0:x0+tile_w, :] = tile

    return mosaic

# ---------------------------------------------------------

# 6. High-level function

# ---------------------------------------------------------

def layout_drone_tour_by_overlap(folder: str,

                                 max_images: int = None,

                                 base_size: Tuple[int, int] = (512, 512)) -> np.ndarray:

    """

    1) Load sequential frames from folder.

    2) Estimate frame-to-frame translations via phase correlation.

    3) Integrate to get 2D positions along the flight path.

    4) Straighten and normalize the path with PCA.

    5) Snap to a grid and build a mosaic.

    """

    images = load_images_sorted(folder, max_images=max_images, target_size=base_size)

    positions = accumulate_positions(images)

    pos_norm = normalize_positions(positions)

    grid_rows, grid_cols = choose_grid_shape(len(images))

    print(f"Grid shape: {grid_rows} x {grid_cols}")

    assignments = snap_to_grid(pos_norm, grid_rows, grid_cols)

    mosaic = build_mosaic(images, assignments, grid_rows, grid_cols,

                          tile_size=(256, 256))

    return mosaic

if __name__ == "__main__":

    # Requirements:

    # pip install opencv-python numpy

    folder = "."

    mosaic = layout_drone_tour_by_overlap(folder, max_images=None)

    cv2.imwrite("drone_path_layout.png", mosaic)

    print("Saved drone_path_layout.png")


Monday, May 11, 2026

Continued from previous post...

 Note to software engineers:

An AI system’s lifetime begins long before the first line of code is written, and Article 50’s transparency obligations shape that lifetime from the earliest prototype to the final shutdown. Engineers must think of transparency not as a late‑stage compliance patch but as a design constraint that grows in importance as the system matures. The guidelines make this clear when they say that providers must “develop and design the AI system in such a way that the natural persons concerned are informed they are interacting with an AI system,” a line that signals that transparency is a design‑time responsibility, not a deployment‑time afterthought.

In the prototype phase, engineers are still exploring feasibility, but this is the moment when the system’s eventual interaction patterns, content‑generation capabilities, and biometric or emotional inference pathways are first conceived. Even though research‑only prototypes are exempt, the guidelines warn that the exemption disappears the moment the system or its outputs leave the research context. Engineers must therefore architect prototypes with the assumption that transparency features will eventually be required. This means choosing model architectures that can support watermarking or provenance metadata, designing interaction flows that can accommodate disclosure messages, and avoiding early design choices that make later transparency impossible or brittle. For agentic systems, the guidelines explicitly note that if the provider cannot reliably determine when the agent will interact with natural persons, the agent must disclose itself in all likely interactions. Engineers must therefore design agent frameworks with built‑in disclosure hooks from day one.

The guidelines require that AI‑generated or manipulated content be “marked in a machine‑readable format and detectable as artificially generated or manipulated,” and that the technical solutions be “effective, interoperable, robust and reliable.” Those phrases are deceptively simple; in practice they mean that every layer of your system — storage, services, and user interfaces — must participate in preserving, propagating, and exposing these signals. If any layer drops the signal, the entire chain fails.

In the backend storage layer, marking begins as metadata, provenance, or embedded signatures. Engineers must treat marking as a first‑class property of the content object, not an afterthought. If the system stores images, videos, audio, or text, the marking must be embedded in a way that survives format conversions, compression, and distribution. For images and video, this may mean cryptographic watermarks, metadata fields, or fingerprint hashes stored alongside the asset. For text, it may mean structured provenance metadata or embedded markers that do not alter meaning. The storage system must support immutable provenance fields, versioning, and auditability, because the guidelines expect markings to be robust against tampering. A backend that strips metadata, rewrites files, or normalizes formats without preserving markings becomes a compliance liability. Engineers must therefore design storage schemas that treat marking as part of the content’s identity, ensuring that every read, write, transform, or replication operation preserves it. This includes object stores, relational databases, distributed file systems, and content delivery caches. Even internal transformations — transcoding, resizing, chunking — must be marking‑aware.

In the middle‑tier business services, marking becomes a routing and policy problem. These services orchestrate content flows, apply business logic, and integrate with external systems, and they must propagate marking metadata faithfully. A service that generates content must attach markings at creation time; a service that manipulates content must determine whether the manipulation is substantial enough to require marking, because the guidelines distinguish between minor edits and semantic changes. A service that aggregates or composes content must merge markings without losing fidelity. Business logic must enforce that any content leaving the system — through APIs, feeds, notifications, or exports — carries its marking intact. Detection services must be exposed as callable APIs so that downstream systems, partners, or users can verify authenticity. Middle‑tier engineers must also design for adversarial conditions: markings may be intentionally removed, corrupted, or spoofed, so services must validate markings, detect inconsistencies, and log anomalies. Because the guidelines require interoperability, services must support open standards for provenance and watermarking rather than proprietary formats that cannot be consumed by others. Middle‑tier systems must also enforce policy boundaries: if content is destined for a context where disclosure is required at first exposure, the service must ensure that the frontend receives the necessary metadata to surface that disclosure.

On the frontend, marking becomes human‑visible disclosure. The guidelines require that natural persons be informed “at the latest at the time of the first interaction or exposure,” which means the frontend must surface clear, perceivable, accessible signals that the content is AI‑generated or manipulated. This is where metadata becomes UI. Engineers must design controls that display labels, badges, overlays, or contextual notices without degrading usability. For interactive systems, the frontend must announce that the user is interacting with an AI system, whether through text, voice, or visual cues. For deep fakes, the frontend must display a disclosure that is visible at the moment the content appears, not buried in menus or footnotes. For AI‑generated text informing the public, the frontend must show a disclosure unless the content has undergone human editorial review. Accessibility requirements apply, so disclosures must work for screen readers, high‑contrast modes, and users with cognitive or perceptual differences. Frontend engineers must also ensure that disclosures persist across navigation, embedding, sharing, and re‑rendering, because the guidelines expect disclosures to survive distribution. If the frontend allows users to download or share content, the marking must travel with it.

The entire stack must work together to ensure that marking and detection survive the full lifecycle of content. Backend systems must store markings immutably; middle‑tier services must propagate and validate them; frontends must expose them to users. If any layer fails, the system becomes non‑compliant. The guidelines’ insistence on robustness and interoperability means that engineers must design for hostile environments, cross‑platform distribution, and long‑term persistence. Markings must survive not just your own system’s transformations but also the unpredictable behavior of downstream systems, social platforms, and user devices. Detection must remain possible even when content is recompressed, clipped, or partially transformed.

In practice, this means that marking and detection are not features of a single component but properties of the entire architecture. They must be designed into storage schemas, service contracts, API payloads, UI components, and operational workflows. They must be tested end‑to‑end, monitored in production, and preserved during migrations and refactors. They must be resilient to adversarial attempts to remove them and flexible enough to evolve as standards mature. And because the guidelines apply to both providers and deployers, engineers must ensure that transparency signals remain intact even when content leaves their control.

As the system moves into the initial version or MVP stage, the engineering focus shifts from exploration to implementation. This is where the transparency obligations begin to crystallize into concrete engineering tasks. Interactive systems must be instrumented so that every user‑facing entry point can surface an AI disclosure at first interaction. Generative systems must begin to embed machine‑readable markings into outputs, and detection APIs must be designed so that downstream actors can verify authenticity. Engineers working on data pipelines must ensure that the system can distinguish between minor edits and semantic manipulations, because the guidelines draw a sharp line between the two. A grammar‑corrected sentence is exempt; a sentence whose meaning has been altered is not. This distinction must be encoded into the system’s logic, not left to human judgment at deployment time.

During the growth phase, the system expands in scale, features, and user base. This is the phase where transparency obligations become operational rather than theoretical. Engineers must ensure that disclosure mechanisms scale across modalities — text, audio, video, avatars, VR environments — because the guidelines treat all of these as potential interaction channels. As the system integrates with other services, engineers must ensure that transparency metadata survives transformations, API hops, and distribution through third‑party platforms. The guidelines emphasize that marking must be “effective, interoperable, robust and reliable,” which means engineers must design for adversarial environments where markings may be stripped, corrupted, or intentionally removed. This requires redundancy, cryptographic signatures, and provenance chains that can survive format conversions.

For deployers, the growth phase is where operational workflows must incorporate transparency. Engineers responsible for integration must ensure that emotion‑recognition or biometric‑categorisation systems surface disclosures at the moment of exposure, whether in a mobile app, a kiosk, a classroom, or a workplace tool. Engineers working on content‑publishing pipelines must ensure that deep fakes or AI‑generated text published on matters of public interest are labelled clearly unless they undergo genuine editorial review. The guidelines quote that text is exempt only if it has undergone “human review or editorial control and is subject to editorial responsibility,” which means engineers must build audit trails that prove such review occurred.

As the system reaches maturity, the engineering challenge shifts to maintaining transparency across evolving features, new markets, and new regulatory expectations. Mature systems often accumulate technical debt, and transparency features must be refactored to remain reliable. Engineers must ensure that marking and detection systems remain state‑of‑the‑art, because the guidelines require providers to implement technically feasible solutions, not outdated ones. As models are retrained or replaced, engineers must ensure that transparency features are preserved across versions. When new interaction modes are added — such as voice, AR, or agent‑to‑human messaging — engineers must extend disclosure mechanisms accordingly. Mature systems also face increased scrutiny from regulators, meaning engineers must maintain logs, provenance records, and compliance evidence that can withstand audits.

In the maintenance phase, transparency becomes a matter of operational discipline. Engineers must monitor whether disclosures are being surfaced correctly, whether markings remain intact across distribution channels, and whether detection tools continue to function as intended. When content is syndicated, embedded, or transformed by downstream systems, engineers must ensure that transparency metadata is not lost. The guidelines emphasize that transparency must be provided “at the latest at the time of the first interaction or exposure,” which means engineers must design monitoring systems that detect when disclosures fail to appear. Maintenance also includes updating transparency mechanisms as adversarial techniques evolve, because robustness is an ongoing requirement, not a one‑time achievement.

Finally, in the decommissioning phase, engineers must ensure that transparency obligations are respected even as the system winds down. If the system continues to generate or serve content during a sunset period, markings and disclosures must remain active. If the system is replaced, engineers must ensure that legacy content remains labelled, especially deep fakes and AI‑generated text that continue to circulate. If detection tools are retired, engineers must provide alternative means for users to verify authenticity. Decommissioning also requires preserving audit logs and provenance data for regulatory review, because the guidelines make clear that compliance is evaluated across the system’s operational lifetime, not just at a single point in time.

Across all phases, engineers at every level have distinct responsibilities. Model engineers must ensure that model architectures support watermarking, provenance, and disclosure triggers. Backend engineers must propagate transparency metadata through APIs and services. Frontend engineers must surface disclosures in ways that are perceivable, accessible, and adapted to vulnerable users. DevOps and SRE teams must ensure that transparency features remain reliable under load and across deployments. Security engineers must defend markings and detection systems against tampering. QA engineers must test transparency features as rigorously as functional features, because a missing disclosure is a compliance failure. Product engineers must ensure that editorial workflows, content pipelines, and agent behaviors align with the guidelines’ expectations. And engineering managers must ensure that transparency is treated as a first‑class requirement throughout the system’s lifetime.

The guidelines’ structure may appear legalistic, but their message to engineers is simple: transparency is not a feature; it is a lifecycle obligation. It must be designed early, implemented consistently, maintained continuously, and preserved even as the system is retired. Every phase of the AI system’s life introduces new transparency risks, and engineers must anticipate and mitigate those risks long before regulators come asking.


Sunday, May 10, 2026

 EU Guidelines for Article 50 of the AI Act

The draft EU Guidelines on Article 50 of the AI Act form a comprehensive attempt to translate the regulation’s transparency obligations into practical expectations for providers and deployers of AI systems. The guideline opens by situating Article 50 within the broader architecture of the AI Act, which entered into force on 1 August 2024 and adopts a risk‑based approach to regulating AI. Transparency risks form one of the four risk categories, and Article 50’s obligations will apply from 2 August 2026. The Commission stresses that these guidelines are non‑binding, but they are intended to help authorities, providers, and deployers implement the law consistently. As the document states, the purpose is to “serve as practical guidance to assist competent authorities, as well as providers and deployers of AI systems, in ensuring compliance with the transparency obligations” (quoted from the document).

The guidelines begin by mapping the four transparency obligations in Article 50. The first concerns AI systems that interact directly with natural persons; the second concerns AI systems that generate or manipulate synthetic content; the third concerns emotion recognition and biometric categorisation systems; and the fourth concerns deep fakes and AI‑generated or manipulated text published to inform the public on matters of public interest. Each obligation has its own scope, responsible actor, and exceptions, and the guidelines emphasize that these obligations can apply cumulatively to the same system or output. The rationale behind all four obligations is to reduce risks of deception, impersonation, manipulation, misinformation, and fraud, and to protect democratic processes and societal trust. The guidelines quote the Act’s recitals to explain that transparency helps individuals “take informed decisions” and calibrate their trust in AI‑mediated interactions.

The guidelines clarifies who is responsible for compliance. Providers are those who develop or place AI systems on the market under their name, regardless of where they are located, and they must ensure compliance with Article 50(1), (2), and (5) before the system is placed on the market. Deployers are those who use AI systems under their authority, unless the use is purely personal and non‑professional. The guidelines give examples to illustrate the distinction: a media outlet using AI to support its reporting is a deployer; an online platform merely transmitting AI‑generated content is not. The guidelines also explain that purely personal, non‑professional use is excluded from deployer obligations, but this exclusion is narrow. A person generating a deep fake of a mayor and posting it publicly cannot claim the personal‑use exemption, because the content affects public discourse. The guidelines quote: “an AI-generated or manipulated deep fake that is made publicly available… should not be considered a purely personal non-professional activity” (quoted from the document).

Research and development activities are also excluded when the AI system is used solely for scientific research, but the moment the system or its outputs are used outside that context, Article 50 applies. Open‑source systems are not exempt unless they fall outside all Article 50 obligations, meaning open‑source providers and deployers must still comply when their systems fall within scope.

The guidelines emphasize that transparency obligations do not imply legality of the underlying system. A system may comply with Article 50 but still be prohibited under Article 5, such as emotion recognition in workplaces or schools. Similarly, systems subject to Article 50 may also be high‑risk and must meet additional requirements.

The guidelines call out the first major obligation: transparency for interactive AI systems. Providers must design systems so that natural persons are informed they are interacting with AI. The guidelines unpack what it means for a system to be “intended to interact directly with natural persons.” The system must be an AI system, must be designed for bidirectional exchange, must interact directly rather than through intermediaries, and must interact with natural persons rather than operating in closed industrial environments. Examples include chatbots, voice assistants, AI avatars, and social‑media bots. Systems like recommender engines, spam filters, or backend decision‑support tools do not qualify because they do not engage in direct interaction.

The obligation requires disclosure at or before the first interaction, and the guidelines emphasize that the disclosure must be clear, accessible, and adapted to vulnerable groups such as children or persons with disabilities. The guidelines give examples of acceptable disclosures, such as a chatbot stating “You are interacting with an AI system,” a voice assistant announcing its AI nature, or a visible AI label on an email generated by an AI agent. They also warn against disclosures buried in terms and conditions, ambiguous signals, or purely machine‑readable metadata. The guidelines stress that multimodal disclosure — combining text, audio, and visual cues — is often the most effective.

Two exceptions apply. The first is when the artificial nature of the interaction is obvious to a reasonably well‑informed, observant, and circumspect person, taking into account the target audience and context. The guidelines explain that this standard is borrowed from EU consumer law. For example, developers interacting with a code‑assistant chatbot can reasonably be expected to know it is AI, but a highly realistic robotic pet or a human‑like avatar in a virtual environment would not qualify as obvious. The second exception applies when the system is authorised by law for detecting, preventing, investigating, or prosecuting criminal offences, except when the system is available to the public to report crimes. Police chatbots for public reporting must still disclose their AI nature.

The second major obligation concerns marking and detection of AI‑generated or manipulated content. Providers of such systems must ensure that outputs are marked in a machine‑readable format and that the content is detectable as AI‑generated or manipulated. Both marking and detection must be implemented; one without the other is insufficient. The guidelines quote: “Fulfilling only one element… will not suffice” (quoted from the document). The obligation applies to synthetic audio, image, video, or text content, including multimodal content and virtual or augmented reality. It applies to both generation and manipulation, and includes GPAI systems and agentic systems when their outputs are perceptible by humans.

The guidelines clarify what falls outside the scope: content that merely reproduces existing material, machine‑to‑machine outputs, sensor data, or industrial outputs not intended for human interpretation. They also explain that marking solutions may include watermarks, metadata, cryptographic provenance, fingerprints, or combinations thereof. Providers may implement marking at the model or system level and may rely on upstream solutions, but they remain responsible for compliance.

Detection tools must be made available so that natural persons and relevant actors can verify whether content is AI‑generated or manipulated. The results must be human‑readable and available at first exposure. The technical solutions must be effective, reliable, robust, and interoperable. The guidelines explain each term: effectiveness means enabling humans to distinguish AI content; reliability means accurate identification; robustness means resilience to alterations and adversarial attacks; interoperability means compatibility across systems. Providers must implement technically feasible, state‑of‑the‑art solutions, and because no single technique currently satisfies all requirements, combinations of techniques are expected. The guidelines allow narrow exceptions for industrial applications where outputs are strictly technical and confined to professional users, or for ephemeral real‑time content in contexts like video games.

The guidelines then describe exceptions: systems performing only standard editing (such as grammar correction, noise reduction, or minor colour adjustments) are exempt; systems that do not substantially alter input data or its semantics are exempt; and systems authorised by law for criminal‑offence purposes are exempt. The guidelines provide examples of minor edits versus semantic changes, noting that adding or removing objects, altering body shape, or changing skin colour are substantial manipulations requiring marking.

The third obligation concerns emotion recognition and biometric categorisation systems. Deployers must inform natural persons exposed to such systems, whether in real time or ex post. Emotion recognition is defined as identifying or inferring emotions or intentions from biometric data, and biometric categorisation involves assigning persons to categories based on biometric data. The obligation applies broadly, regardless of whether the system is high‑risk, though many such systems are high‑risk by definition. Deployers must inform all exposed persons, including children, in a clear and accessible manner at first exposure. The guidelines give examples such as pop‑up notices in games or signage at exhibition entrances. The only exception is when the system is authorised by law for criminal‑offence purposes.

The fourth obligation concerns deep fakes and AI‑generated or manipulated text published to inform the public on matters of public interest. Deployers must clearly disclose that deep fake content has been artificially generated or manipulated. A deep fake is defined as AI‑generated or manipulated image, audio, or video content that resembles existing persons, objects, places, entities, or events and would falsely appear to a person to be authentic or truthful. The guidelines unpack each element: resemblance must be appreciable; the subject must be realistic; the content must depict persons, objects, places, entities, or events; and the content must be capable of misleading a person. The guidelines emphasize that the assessment must consider the actual audience, including vulnerable groups, not an abstract average person. Minor technical edits do not create deep fakes, but substantive manipulations do.

Deployers must label deep fakes in a clear and perceivable way. However, an attenuated regime applies to artistic, creative, satirical, fictional, or analogous works, where disclosure must be done in an appropriate manner that does not hamper enjoyment of the work. The guidelines explain each category and require that the artistic or fictional nature be evident. Even in these cases, deployers must safeguard the rights and freedoms of third parties, including image rights and intellectual property.

The guidelines then address AI‑generated or manipulated text published to inform the public on matters of public interest. The text must be published, meaning accessible to an indeterminate public; it must aim to inform; and it must concern matters of public interest such as public administration, health, environment, consumer safety, politics, or science. Deployers must disclose that the text is AI‑generated or manipulated unless two conditions are met: the text has undergone human review or editorial control, and a natural or legal person holds editorial responsibility. Human review must be substantive, not superficial, and editorial responsibility must be publicly identifiable. The guidelines give examples of qualifying and non‑qualifying cases.

The guidelines then explains the horizontal requirement in Article 50(5): all information must be provided clearly, distinguishably, and at the latest at first interaction or exposure, and must comply with accessibility requirements. Information must be noticeable, easy to understand, and not buried in manuals or menus. First exposure applies to each natural person encountering the content, not just the first person ever exposed. The guidelines give examples such as labelling deep fakes at the start of a video rather than in end credits.

The enforcement section explains that providers and deployers may demonstrate compliance by adhering to a code of practice assessed as adequate by the AI Office. Doing so simplifies supervision and may mitigate penalties. Those not adhering must demonstrate compliance through other means and may face more scrutiny. Market surveillance authorities, the AI Office, and the European Data Protection Supervisor enforce Article 50, with powers under the AI Act and Regulation 2019/1020. Penalties can reach €15 million or 3% of global turnover. Article 50 applies from 2 August 2026, and all systems in scope must comply regardless of when they were placed on the market, except for a proposed transitional rule for marking and detection under Article 50(2). Existing AI‑generated content does not need retroactive marking, but actors are encouraged to label it voluntarily.

The guidelines conclude by noting that they will be reviewed as technology and enforcement evolve, and the Commission invites ongoing contributions from stakeholders.