Monday, May 11, 2026

Continued from previous post...

 Note to software engineers:

An AI system’s lifetime begins long before the first line of code is written, and Article 50’s transparency obligations shape that lifetime from the earliest prototype to the final shutdown. Engineers must think of transparency not as a late‑stage compliance patch but as a design constraint that grows in importance as the system matures. The guidelines make this clear when they say that providers must “develop and design the AI system in such a way that the natural persons concerned are informed they are interacting with an AI system,” a line that signals that transparency is a design‑time responsibility, not a deployment‑time afterthought.

In the prototype phase, engineers are still exploring feasibility, but this is the moment when the system’s eventual interaction patterns, content‑generation capabilities, and biometric or emotional inference pathways are first conceived. Even though research‑only prototypes are exempt, the guidelines warn that the exemption disappears the moment the system or its outputs leave the research context. Engineers must therefore architect prototypes with the assumption that transparency features will eventually be required. This means choosing model architectures that can support watermarking or provenance metadata, designing interaction flows that can accommodate disclosure messages, and avoiding early design choices that make later transparency impossible or brittle. For agentic systems, the guidelines explicitly note that if the provider cannot reliably determine when the agent will interact with natural persons, the agent must disclose itself in all likely interactions. Engineers must therefore design agent frameworks with built‑in disclosure hooks from day one.

The guidelines require that AI‑generated or manipulated content be “marked in a machine‑readable format and detectable as artificially generated or manipulated,” and that the technical solutions be “effective, interoperable, robust and reliable.” Those phrases are deceptively simple; in practice they mean that every layer of your system — storage, services, and user interfaces — must participate in preserving, propagating, and exposing these signals. If any layer drops the signal, the entire chain fails.

In the backend storage layer, marking begins as metadata, provenance, or embedded signatures. Engineers must treat marking as a first‑class property of the content object, not an afterthought. If the system stores images, videos, audio, or text, the marking must be embedded in a way that survives format conversions, compression, and distribution. For images and video, this may mean cryptographic watermarks, metadata fields, or fingerprint hashes stored alongside the asset. For text, it may mean structured provenance metadata or embedded markers that do not alter meaning. The storage system must support immutable provenance fields, versioning, and auditability, because the guidelines expect markings to be robust against tampering. A backend that strips metadata, rewrites files, or normalizes formats without preserving markings becomes a compliance liability. Engineers must therefore design storage schemas that treat marking as part of the content’s identity, ensuring that every read, write, transform, or replication operation preserves it. This includes object stores, relational databases, distributed file systems, and content delivery caches. Even internal transformations — transcoding, resizing, chunking — must be marking‑aware.

In the middle‑tier business services, marking becomes a routing and policy problem. These services orchestrate content flows, apply business logic, and integrate with external systems, and they must propagate marking metadata faithfully. A service that generates content must attach markings at creation time; a service that manipulates content must determine whether the manipulation is substantial enough to require marking, because the guidelines distinguish between minor edits and semantic changes. A service that aggregates or composes content must merge markings without losing fidelity. Business logic must enforce that any content leaving the system — through APIs, feeds, notifications, or exports — carries its marking intact. Detection services must be exposed as callable APIs so that downstream systems, partners, or users can verify authenticity. Middle‑tier engineers must also design for adversarial conditions: markings may be intentionally removed, corrupted, or spoofed, so services must validate markings, detect inconsistencies, and log anomalies. Because the guidelines require interoperability, services must support open standards for provenance and watermarking rather than proprietary formats that cannot be consumed by others. Middle‑tier systems must also enforce policy boundaries: if content is destined for a context where disclosure is required at first exposure, the service must ensure that the frontend receives the necessary metadata to surface that disclosure.

On the frontend, marking becomes human‑visible disclosure. The guidelines require that natural persons be informed “at the latest at the time of the first interaction or exposure,” which means the frontend must surface clear, perceivable, accessible signals that the content is AI‑generated or manipulated. This is where metadata becomes UI. Engineers must design controls that display labels, badges, overlays, or contextual notices without degrading usability. For interactive systems, the frontend must announce that the user is interacting with an AI system, whether through text, voice, or visual cues. For deep fakes, the frontend must display a disclosure that is visible at the moment the content appears, not buried in menus or footnotes. For AI‑generated text informing the public, the frontend must show a disclosure unless the content has undergone human editorial review. Accessibility requirements apply, so disclosures must work for screen readers, high‑contrast modes, and users with cognitive or perceptual differences. Frontend engineers must also ensure that disclosures persist across navigation, embedding, sharing, and re‑rendering, because the guidelines expect disclosures to survive distribution. If the frontend allows users to download or share content, the marking must travel with it.

The entire stack must work together to ensure that marking and detection survive the full lifecycle of content. Backend systems must store markings immutably; middle‑tier services must propagate and validate them; frontends must expose them to users. If any layer fails, the system becomes non‑compliant. The guidelines’ insistence on robustness and interoperability means that engineers must design for hostile environments, cross‑platform distribution, and long‑term persistence. Markings must survive not just your own system’s transformations but also the unpredictable behavior of downstream systems, social platforms, and user devices. Detection must remain possible even when content is recompressed, clipped, or partially transformed.

In practice, this means that marking and detection are not features of a single component but properties of the entire architecture. They must be designed into storage schemas, service contracts, API payloads, UI components, and operational workflows. They must be tested end‑to‑end, monitored in production, and preserved during migrations and refactors. They must be resilient to adversarial attempts to remove them and flexible enough to evolve as standards mature. And because the guidelines apply to both providers and deployers, engineers must ensure that transparency signals remain intact even when content leaves their control.

As the system moves into the initial version or MVP stage, the engineering focus shifts from exploration to implementation. This is where the transparency obligations begin to crystallize into concrete engineering tasks. Interactive systems must be instrumented so that every user‑facing entry point can surface an AI disclosure at first interaction. Generative systems must begin to embed machine‑readable markings into outputs, and detection APIs must be designed so that downstream actors can verify authenticity. Engineers working on data pipelines must ensure that the system can distinguish between minor edits and semantic manipulations, because the guidelines draw a sharp line between the two. A grammar‑corrected sentence is exempt; a sentence whose meaning has been altered is not. This distinction must be encoded into the system’s logic, not left to human judgment at deployment time.

During the growth phase, the system expands in scale, features, and user base. This is the phase where transparency obligations become operational rather than theoretical. Engineers must ensure that disclosure mechanisms scale across modalities — text, audio, video, avatars, VR environments — because the guidelines treat all of these as potential interaction channels. As the system integrates with other services, engineers must ensure that transparency metadata survives transformations, API hops, and distribution through third‑party platforms. The guidelines emphasize that marking must be “effective, interoperable, robust and reliable,” which means engineers must design for adversarial environments where markings may be stripped, corrupted, or intentionally removed. This requires redundancy, cryptographic signatures, and provenance chains that can survive format conversions.

For deployers, the growth phase is where operational workflows must incorporate transparency. Engineers responsible for integration must ensure that emotion‑recognition or biometric‑categorisation systems surface disclosures at the moment of exposure, whether in a mobile app, a kiosk, a classroom, or a workplace tool. Engineers working on content‑publishing pipelines must ensure that deep fakes or AI‑generated text published on matters of public interest are labelled clearly unless they undergo genuine editorial review. The guidelines quote that text is exempt only if it has undergone “human review or editorial control and is subject to editorial responsibility,” which means engineers must build audit trails that prove such review occurred.

As the system reaches maturity, the engineering challenge shifts to maintaining transparency across evolving features, new markets, and new regulatory expectations. Mature systems often accumulate technical debt, and transparency features must be refactored to remain reliable. Engineers must ensure that marking and detection systems remain state‑of‑the‑art, because the guidelines require providers to implement technically feasible solutions, not outdated ones. As models are retrained or replaced, engineers must ensure that transparency features are preserved across versions. When new interaction modes are added — such as voice, AR, or agent‑to‑human messaging — engineers must extend disclosure mechanisms accordingly. Mature systems also face increased scrutiny from regulators, meaning engineers must maintain logs, provenance records, and compliance evidence that can withstand audits.

In the maintenance phase, transparency becomes a matter of operational discipline. Engineers must monitor whether disclosures are being surfaced correctly, whether markings remain intact across distribution channels, and whether detection tools continue to function as intended. When content is syndicated, embedded, or transformed by downstream systems, engineers must ensure that transparency metadata is not lost. The guidelines emphasize that transparency must be provided “at the latest at the time of the first interaction or exposure,” which means engineers must design monitoring systems that detect when disclosures fail to appear. Maintenance also includes updating transparency mechanisms as adversarial techniques evolve, because robustness is an ongoing requirement, not a one‑time achievement.

Finally, in the decommissioning phase, engineers must ensure that transparency obligations are respected even as the system winds down. If the system continues to generate or serve content during a sunset period, markings and disclosures must remain active. If the system is replaced, engineers must ensure that legacy content remains labelled, especially deep fakes and AI‑generated text that continue to circulate. If detection tools are retired, engineers must provide alternative means for users to verify authenticity. Decommissioning also requires preserving audit logs and provenance data for regulatory review, because the guidelines make clear that compliance is evaluated across the system’s operational lifetime, not just at a single point in time.

Across all phases, engineers at every level have distinct responsibilities. Model engineers must ensure that model architectures support watermarking, provenance, and disclosure triggers. Backend engineers must propagate transparency metadata through APIs and services. Frontend engineers must surface disclosures in ways that are perceivable, accessible, and adapted to vulnerable users. DevOps and SRE teams must ensure that transparency features remain reliable under load and across deployments. Security engineers must defend markings and detection systems against tampering. QA engineers must test transparency features as rigorously as functional features, because a missing disclosure is a compliance failure. Product engineers must ensure that editorial workflows, content pipelines, and agent behaviors align with the guidelines’ expectations. And engineering managers must ensure that transparency is treated as a first‑class requirement throughout the system’s lifetime.

The guidelines’ structure may appear legalistic, but their message to engineers is simple: transparency is not a feature; it is a lifecycle obligation. It must be designed early, implemented consistently, maintained continuously, and preserved even as the system is retired. Every phase of the AI system’s life introduces new transparency risks, and engineers must anticipate and mitigate those risks long before regulators come asking.


Sunday, May 10, 2026

 EU Guidelines for Article 50 of the AI Act

The draft EU Guidelines on Article 50 of the AI Act form a comprehensive attempt to translate the regulation’s transparency obligations into practical expectations for providers and deployers of AI systems. The guideline opens by situating Article 50 within the broader architecture of the AI Act, which entered into force on 1 August 2024 and adopts a risk‑based approach to regulating AI. Transparency risks form one of the four risk categories, and Article 50’s obligations will apply from 2 August 2026. The Commission stresses that these guidelines are non‑binding, but they are intended to help authorities, providers, and deployers implement the law consistently. As the document states, the purpose is to “serve as practical guidance to assist competent authorities, as well as providers and deployers of AI systems, in ensuring compliance with the transparency obligations” (quoted from the document).

The guidelines begin by mapping the four transparency obligations in Article 50. The first concerns AI systems that interact directly with natural persons; the second concerns AI systems that generate or manipulate synthetic content; the third concerns emotion recognition and biometric categorisation systems; and the fourth concerns deep fakes and AI‑generated or manipulated text published to inform the public on matters of public interest. Each obligation has its own scope, responsible actor, and exceptions, and the guidelines emphasize that these obligations can apply cumulatively to the same system or output. The rationale behind all four obligations is to reduce risks of deception, impersonation, manipulation, misinformation, and fraud, and to protect democratic processes and societal trust. The guidelines quote the Act’s recitals to explain that transparency helps individuals “take informed decisions” and calibrate their trust in AI‑mediated interactions.

The guidelines clarifies who is responsible for compliance. Providers are those who develop or place AI systems on the market under their name, regardless of where they are located, and they must ensure compliance with Article 50(1), (2), and (5) before the system is placed on the market. Deployers are those who use AI systems under their authority, unless the use is purely personal and non‑professional. The guidelines give examples to illustrate the distinction: a media outlet using AI to support its reporting is a deployer; an online platform merely transmitting AI‑generated content is not. The guidelines also explain that purely personal, non‑professional use is excluded from deployer obligations, but this exclusion is narrow. A person generating a deep fake of a mayor and posting it publicly cannot claim the personal‑use exemption, because the content affects public discourse. The guidelines quote: “an AI-generated or manipulated deep fake that is made publicly available… should not be considered a purely personal non-professional activity” (quoted from the document).

Research and development activities are also excluded when the AI system is used solely for scientific research, but the moment the system or its outputs are used outside that context, Article 50 applies. Open‑source systems are not exempt unless they fall outside all Article 50 obligations, meaning open‑source providers and deployers must still comply when their systems fall within scope.

The guidelines emphasize that transparency obligations do not imply legality of the underlying system. A system may comply with Article 50 but still be prohibited under Article 5, such as emotion recognition in workplaces or schools. Similarly, systems subject to Article 50 may also be high‑risk and must meet additional requirements.

The guidelines call out the first major obligation: transparency for interactive AI systems. Providers must design systems so that natural persons are informed they are interacting with AI. The guidelines unpack what it means for a system to be “intended to interact directly with natural persons.” The system must be an AI system, must be designed for bidirectional exchange, must interact directly rather than through intermediaries, and must interact with natural persons rather than operating in closed industrial environments. Examples include chatbots, voice assistants, AI avatars, and social‑media bots. Systems like recommender engines, spam filters, or backend decision‑support tools do not qualify because they do not engage in direct interaction.

The obligation requires disclosure at or before the first interaction, and the guidelines emphasize that the disclosure must be clear, accessible, and adapted to vulnerable groups such as children or persons with disabilities. The guidelines give examples of acceptable disclosures, such as a chatbot stating “You are interacting with an AI system,” a voice assistant announcing its AI nature, or a visible AI label on an email generated by an AI agent. They also warn against disclosures buried in terms and conditions, ambiguous signals, or purely machine‑readable metadata. The guidelines stress that multimodal disclosure — combining text, audio, and visual cues — is often the most effective.

Two exceptions apply. The first is when the artificial nature of the interaction is obvious to a reasonably well‑informed, observant, and circumspect person, taking into account the target audience and context. The guidelines explain that this standard is borrowed from EU consumer law. For example, developers interacting with a code‑assistant chatbot can reasonably be expected to know it is AI, but a highly realistic robotic pet or a human‑like avatar in a virtual environment would not qualify as obvious. The second exception applies when the system is authorised by law for detecting, preventing, investigating, or prosecuting criminal offences, except when the system is available to the public to report crimes. Police chatbots for public reporting must still disclose their AI nature.

The second major obligation concerns marking and detection of AI‑generated or manipulated content. Providers of such systems must ensure that outputs are marked in a machine‑readable format and that the content is detectable as AI‑generated or manipulated. Both marking and detection must be implemented; one without the other is insufficient. The guidelines quote: “Fulfilling only one element… will not suffice” (quoted from the document). The obligation applies to synthetic audio, image, video, or text content, including multimodal content and virtual or augmented reality. It applies to both generation and manipulation, and includes GPAI systems and agentic systems when their outputs are perceptible by humans.

The guidelines clarify what falls outside the scope: content that merely reproduces existing material, machine‑to‑machine outputs, sensor data, or industrial outputs not intended for human interpretation. They also explain that marking solutions may include watermarks, metadata, cryptographic provenance, fingerprints, or combinations thereof. Providers may implement marking at the model or system level and may rely on upstream solutions, but they remain responsible for compliance.

Detection tools must be made available so that natural persons and relevant actors can verify whether content is AI‑generated or manipulated. The results must be human‑readable and available at first exposure. The technical solutions must be effective, reliable, robust, and interoperable. The guidelines explain each term: effectiveness means enabling humans to distinguish AI content; reliability means accurate identification; robustness means resilience to alterations and adversarial attacks; interoperability means compatibility across systems. Providers must implement technically feasible, state‑of‑the‑art solutions, and because no single technique currently satisfies all requirements, combinations of techniques are expected. The guidelines allow narrow exceptions for industrial applications where outputs are strictly technical and confined to professional users, or for ephemeral real‑time content in contexts like video games.

The guidelines then describe exceptions: systems performing only standard editing (such as grammar correction, noise reduction, or minor colour adjustments) are exempt; systems that do not substantially alter input data or its semantics are exempt; and systems authorised by law for criminal‑offence purposes are exempt. The guidelines provide examples of minor edits versus semantic changes, noting that adding or removing objects, altering body shape, or changing skin colour are substantial manipulations requiring marking.

The third obligation concerns emotion recognition and biometric categorisation systems. Deployers must inform natural persons exposed to such systems, whether in real time or ex post. Emotion recognition is defined as identifying or inferring emotions or intentions from biometric data, and biometric categorisation involves assigning persons to categories based on biometric data. The obligation applies broadly, regardless of whether the system is high‑risk, though many such systems are high‑risk by definition. Deployers must inform all exposed persons, including children, in a clear and accessible manner at first exposure. The guidelines give examples such as pop‑up notices in games or signage at exhibition entrances. The only exception is when the system is authorised by law for criminal‑offence purposes.

The fourth obligation concerns deep fakes and AI‑generated or manipulated text published to inform the public on matters of public interest. Deployers must clearly disclose that deep fake content has been artificially generated or manipulated. A deep fake is defined as AI‑generated or manipulated image, audio, or video content that resembles existing persons, objects, places, entities, or events and would falsely appear to a person to be authentic or truthful. The guidelines unpack each element: resemblance must be appreciable; the subject must be realistic; the content must depict persons, objects, places, entities, or events; and the content must be capable of misleading a person. The guidelines emphasize that the assessment must consider the actual audience, including vulnerable groups, not an abstract average person. Minor technical edits do not create deep fakes, but substantive manipulations do.

Deployers must label deep fakes in a clear and perceivable way. However, an attenuated regime applies to artistic, creative, satirical, fictional, or analogous works, where disclosure must be done in an appropriate manner that does not hamper enjoyment of the work. The guidelines explain each category and require that the artistic or fictional nature be evident. Even in these cases, deployers must safeguard the rights and freedoms of third parties, including image rights and intellectual property.

The guidelines then address AI‑generated or manipulated text published to inform the public on matters of public interest. The text must be published, meaning accessible to an indeterminate public; it must aim to inform; and it must concern matters of public interest such as public administration, health, environment, consumer safety, politics, or science. Deployers must disclose that the text is AI‑generated or manipulated unless two conditions are met: the text has undergone human review or editorial control, and a natural or legal person holds editorial responsibility. Human review must be substantive, not superficial, and editorial responsibility must be publicly identifiable. The guidelines give examples of qualifying and non‑qualifying cases.

The guidelines then explains the horizontal requirement in Article 50(5): all information must be provided clearly, distinguishably, and at the latest at first interaction or exposure, and must comply with accessibility requirements. Information must be noticeable, easy to understand, and not buried in manuals or menus. First exposure applies to each natural person encountering the content, not just the first person ever exposed. The guidelines give examples such as labelling deep fakes at the start of a video rather than in end credits.

The enforcement section explains that providers and deployers may demonstrate compliance by adhering to a code of practice assessed as adequate by the AI Office. Doing so simplifies supervision and may mitigate penalties. Those not adhering must demonstrate compliance through other means and may face more scrutiny. Market surveillance authorities, the AI Office, and the European Data Protection Supervisor enforce Article 50, with powers under the AI Act and Regulation 2019/1020. Penalties can reach €15 million or 3% of global turnover. Article 50 applies from 2 August 2026, and all systems in scope must comply regardless of when they were placed on the market, except for a proposed transitional rule for marking and detection under Article 50(2). Existing AI‑generated content does not need retroactive marking, but actors are encouraged to label it voluntarily.

The guidelines conclude by noting that they will be reviewed as technology and enforcement evolve, and the Commission invites ongoing contributions from stakeholders.


Friday, May 8, 2026

 Tiling of aerial drone survey area

Problem statement:

Most UAV with top-down camera provide a video of their tour regardless of their flight path. This video can be split into several contiguous aerial drone images which may often number in hundreds even for a short duration. We need a software implementation that can take these images as input in the sorted order of the timeline and select those images that can complete tiling of the area surveyed by drone tour. The output of the implementation must be a selection of the original input. Flight path could be assumed to be rectangular with the lower left as origin for simplicity but none of the images have gps information available. Additionally, the selection could be output in the sorted order to start from lower left of the area surveyed and move clockwise along the perimeter but this can be skipped from the implementation.

Solution:

The following implementation uses computer vision (feature matching + homography estimation) to determine spatial relationships between images and then solves a set-cover problem to select a minimal tiling subset. A set of five stages completes the pipeline towards the goal:

Pipeline:

Stage What happens Key algorithm

1. Feature matching ORB keypoints are detected in each image; nearby frames (within search_radius) are matched using brute-force Hamming + Lowe's ratio test, then a homography is estimated with RANSAC. ORB [Rublee 2011], RANSAC [Fischler 1981]

2. Global registration A BFS traversal from the most-connected image propagates homographies so every frame is placed in a single mosaic coordinate system. No GPS needed. Brown & Lowe (2007) panoramic stitching

3. Coverage gridding The mosaic bounding box is discretised into a grid. Each image's warped footprint is rasterised to determine which cells it covers. OpenCV fillConvexPoly

4. Greedy set cover Iteratively selects the image covering the most uncovered cells until 100% coverage is reached. This is the classic greedy approximation (ln n + 1 factor). Chvatal (1979)

5. Spatial sort Selected images are sorted clockwise starting from the lower-left corner using angular ordering around the centroid. Convex-hull sweep

Usage:

# Install dependencies

pip install opencv-python-headless numpy

# Run (CLI)

python drone_tiling.py --input_dir ./my_drone_frames --output tiles.json

# Run (programmatic)

from drone_tiling import select_tiling_from_directory

selected = select_tiling_from_directory("./my_drone_frames")

print(selected)

Implementation:

"""

Drone Image Tiling Selector

============================

Selects a minimal subset of contiguous aerial drone images that completely

tiles (covers) the area surveyed during a drone flight.

Algorithm & Citations

---------------------

1. **Feature Detection & Matching**: ORB (Oriented FAST and Rotated BRIEF)

   detector with brute-force Hamming-distance matching.

   - Rublee, E., Rabaud, V., Konolige, K., & Bradski, G. (2011).

     "ORB: An efficient alternative to SIFT or SURF."

     IEEE International Conference on Computer Vision (ICCV), pp. 2564-2571.

     DOI: 10.1109/ICCV.2011.6126544

   - OpenCV documentation: https://docs.opencv.org/4.x/d1/d89/tutorial_py_orb.html

2. **Homography Estimation (RANSAC)**: Used to compute the projective

   transformation between overlapping image pairs, which gives us the

   relative spatial position of each image.

   - Fischler, M. A., & Bolles, R. C. (1981).

     "Random Sample Consensus: A Paradigm for Model Fitting with

     Applications to Image Analysis and Automated Cartography."

     Communications of the ACM, 24(6), 381-395.

   - OpenCV documentation: https://docs.opencv.org/4.x/d9/dab/tutorial_homography.html

3. **Image Stitching / Registration Pipeline**: The pairwise registration

   approach follows the methodology in:

   - Brown, M., & Lowe, D. G. (2007).

     "Automatic Panoramic Image Stitching using Invariant Features."

     International Journal of Computer Vision, 74(1), 59-73.

     DOI: 10.1007/s11263-006-0002-3

4. **Greedy Weighted Set Cover** for selecting the minimal tiling subset:

   - Chvatal, V. (1979).

     "A Greedy Heuristic for the Set-Covering Problem."

     Mathematics of Operations Research, 4(3), 233-235.

   - Vazirani, V. V. (2001). "Approximation Algorithms", Chapter 2.

     Springer-Verlag. ISBN: 3-540-65367-8.

Dependencies

------------

    pip install opencv-python-headless numpy

Usage

-----

    python drone_tiling.py --input_dir ./drone_images --output tiles.json

"""

import os

import glob

import json

import argparse

import logging

from dataclasses import dataclass, field

from typing import List, Tuple, Dict, Optional, Set

import cv2

import numpy as np

logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")

logger = logging.getLogger(__name__)

# ---------------------------------------------------------------------------

# Configuration

# ---------------------------------------------------------------------------

@dataclass

class TilingConfig:

    """Tuneable parameters for the tiling pipeline."""

    # ORB feature detector

    orb_n_features: int = 3000

    orb_scale_factor: float = 1.2

    orb_n_levels: int = 8

    # Feature matching

    match_ratio_thresh: float = 0.75 # Lowe's ratio test threshold

    min_good_matches: int = 30 # minimum inlier matches to

                                              # consider a pair overlapping

    # RANSAC homography

    ransac_reproj_thresh: float = 5.0 # reprojection error in pixels

    # Coverage grid resolution (number of cells along the longer axis)

    grid_resolution: int = 100

    # Overlap: minimum fraction of an image that must overlap with the

    # already-covered area for the image to be considered redundant.

    redundancy_overlap: float = 0.95

    # Image scaling for speed (process at this fraction of original size)

    work_scale: float = 0.5

# ---------------------------------------------------------------------------

# Data structures

# ---------------------------------------------------------------------------

@dataclass

class ImageRecord:

    """Metadata for a single drone image."""

    index: int

    filepath: str

    filename: str

    width: int = 0

    height: int = 0

    # Position of the image centre in the *mosaic* coordinate frame.

    cx: float = 0.0

    cy: float = 0.0

    # 3x3 homography that maps this image into mosaic coordinates.

    H: Optional[np.ndarray] = None

@dataclass

class PairwiseMatch:

    """Result of matching two images."""

    idx_a: int

    idx_b: int

    n_inliers: int

    H_ab: np.ndarray # homography from image A to image B

    confidence: float = 0.0

# ---------------------------------------------------------------------------

# 1. Feature detection & pair-wise matching

# ---------------------------------------------------------------------------

class FeatureMatcher:

    """

    Detects ORB features in every image and matches consecutive /

    nearby frames to estimate pairwise homographies.

    Reference

    ---------

    Rublee et al., "ORB: An efficient alternative to SIFT or SURF", ICCV 2011.

    OpenCV ORB docs: https://docs.opencv.org/4.x/d1/d89/tutorial_py_orb.html

    """

    def __init__(self, config: TilingConfig):

        self.cfg = config

        self.orb = cv2.ORB_create(

            nfeatures=config.orb_n_features,

            scaleFactor=config.orb_scale_factor,

            nlevels=config.orb_n_levels,

        )

        self.bf = cv2.BFMatcher(cv2.NORM_HAMMING)

    @staticmethod

    def _load_grey(path: str, scale: float) -> np.ndarray:

        img = cv2.imread(path, cv2.IMREAD_GRAYSCALE)

        if img is None:

            raise FileNotFoundError(f"Cannot read image: {path}")

        if scale != 1.0:

            img = cv2.resize(img, None, fx=scale, fy=scale,

                             interpolation=cv2.INTER_AREA)

        return img

    def _detect(self, grey: np.ndarray):

        kp, des = self.orb.detectAndCompute(grey, None)

        return kp, des

    def _match_pair(self, des_a, des_b, kp_a, kp_b) -> Optional[PairwiseMatch]:

        """

        Match descriptors with Lowe's ratio test, then estimate a homography

        with RANSAC.

        Reference

        ---------

        Lowe, D. G. (2004). "Distinctive Image Features from Scale-Invariant

        Keypoints." IJCV, 60(2), 91-110.

        Fischler & Bolles (1981). "Random Sample Consensus." CACM.

        """

        if des_a is None or des_b is None:

            return None

        if len(des_a) < 2 or len(des_b) < 2:

            return None

        raw_matches = self.bf.knnMatch(des_a, des_b, k=2)

        good = []

        for m_pair in raw_matches:

            if len(m_pair) < 2:

                continue

            m, n = m_pair

            if m.distance < self.cfg.match_ratio_thresh * n.distance:

                good.append(m)

        if len(good) < self.cfg.min_good_matches:

            return None

        pts_a = np.float32([kp_a[m.queryIdx].pt for m in good]).reshape(-1, 1, 2)

        pts_b = np.float32([kp_b[m.trainIdx].pt for m in good]).reshape(-1, 1, 2)

        H, mask = cv2.findHomography(

            pts_a, pts_b, cv2.RANSAC, self.cfg.ransac_reproj_thresh

        )

        if H is None:

            return None

        n_inliers = int(mask.sum())

        if n_inliers < self.cfg.min_good_matches:

            return None

        confidence = n_inliers / (8.0 + 0.3 * len(good)) # Brown & Lowe 2007

        return PairwiseMatch(

            idx_a=-1, idx_b=-1,

            n_inliers=n_inliers,

            H_ab=H,

            confidence=confidence,

        )

    def match_sequence(

        self,

        records: List[ImageRecord],

        search_radius: int = 5,

    ) -> List[PairwiseMatch]:

        """

        For every image, match against up to *search_radius* neighbours in

        both directions (handles forward overlap AND lateral overlap from

        adjacent flight strips).

        """

        n = len(records)

        scale = self.cfg.work_scale

        features = []

        for rec in records:

            grey = self._load_grey(rec.filepath, scale)

            rec.height, rec.width = grey.shape[:2]

            kp, des = self._detect(grey)

            features.append((kp, des))

            logger.debug(" %s - %d keypoints", rec.filename, len(kp))

        matches: List[PairwiseMatch] = []

        tested: Set[Tuple[int, int]] = set()

        for i in range(n):

            for j in range(i + 1, min(i + search_radius + 1, n)):

                if (i, j) in tested:

                    continue

                tested.add((i, j))

                kp_a, des_a = features[i]

                kp_b, des_b = features[j]

                pm = self._match_pair(des_a, des_b, kp_a, kp_b)

                if pm is not None:

                    pm.idx_a = i

                    pm.idx_b = j

                    matches.append(pm)

                    logger.debug(

                        " match %s <-> %s inliers=%d conf=%.3f",

                        records[i].filename, records[j].filename,

                        pm.n_inliers, pm.confidence,

                    )

        logger.info("Pairwise matching complete: %d valid pairs from %d images.",

                     len(matches), n)

        return matches

# ---------------------------------------------------------------------------

# 2. Global registration - place every image in a common mosaic frame

# ---------------------------------------------------------------------------

def register_images(

    records: List[ImageRecord],

    matches: List[PairwiseMatch],

) -> List[ImageRecord]:

    """

    Build a connected graph of images and propagate homographies via BFS

    so that every image is mapped into the coordinate frame of a single

    reference image (the one with the most connections).

    Reference

    ---------

    Brown, M. & Lowe, D. G. (2007). "Automatic Panoramic Image Stitching

    using Invariant Features." IJCV 74(1), 59-73.

    OpenCV Stitching: https://docs.opencv.org/4.x/d9/dab/tutorial_homography.html

    """

    n = len(records)

    adj: Dict[int, List[Tuple[int, np.ndarray, float]]] = {i: [] for i in range(n)}

    for pm in matches:

        a, b = pm.idx_a, pm.idx_b

        adj[a].append((b, pm.H_ab, pm.confidence))

        H_inv = np.linalg.inv(pm.H_ab)

        adj[b].append((a, H_inv, pm.confidence))

    ref = max(range(n), key=lambda i: len(adj[i]))

    logger.info("Reference image: %s (index %d, %d neighbours)",

                records[ref].filename, ref, len(adj[ref]))

    records[ref].H = np.eye(3, dtype=np.float64)

    visited = {ref}

    queue = [ref]

    while queue:

        cur = queue.pop(0)

        for nb, H_cur_to_nb, _conf in adj[cur]:

            if nb in visited:

                continue

            H_nb_to_mosaic = records[cur].H @ np.linalg.inv(H_cur_to_nb)

            records[nb].H = H_nb_to_mosaic

            visited.add(nb)

            queue.append(nb)

    disconnected = [i for i in range(n) if records[i].H is None]

    if disconnected:

        logger.warning(

            "%d images could not be registered (disconnected): %s",

            len(disconnected),

            [records[i].filename for i in disconnected],

        )

    for rec in records:

        if rec.H is None:

            continue

        centre_local = np.array([[rec.width / 2, rec.height / 2]], dtype=np.float64)

        centre_mosaic = cv2.perspectiveTransform(

            centre_local.reshape(-1, 1, 2), rec.H

        )

        rec.cx, rec.cy = centre_mosaic[0, 0]

    return records

# ---------------------------------------------------------------------------

# 3. Coverage grid & greedy set-cover selection

# ---------------------------------------------------------------------------

def _image_footprint(rec: ImageRecord) -> Optional[np.ndarray]:

    """Return the four mosaic-frame corners of *rec* as an (4,2) array."""

    if rec.H is None:

        return None

    corners = np.float64([

        [0, 0],

        [rec.width, 0],

        [rec.width, rec.height],

        [0, rec.height],

    ]).reshape(-1, 1, 2)

    warped = cv2.perspectiveTransform(corners, rec.H)

    return warped.reshape(-1, 2)

def _build_coverage_grid(

    records: List[ImageRecord],

    resolution: int,

) -> Tuple[np.ndarray, float, float, float]:

    """Create a boolean grid representing the total survey area."""

    all_corners = []

    for rec in records:

        fp = _image_footprint(rec)

        if fp is not None:

            all_corners.append(fp)

    if not all_corners:

        raise RuntimeError("No images could be registered - cannot build grid.")

    all_pts = np.vstack(all_corners)

    x_min, y_min = all_pts.min(axis=0)

    x_max, y_max = all_pts.max(axis=0)

    span = max(x_max - x_min, y_max - y_min)

    cell_size = span / resolution

    cols = int(np.ceil((x_max - x_min) / cell_size)) + 1

    rows = int(np.ceil((y_max - y_min) / cell_size)) + 1

    grid = np.zeros((rows, cols), dtype=bool)

    return grid, cell_size, float(x_min), float(y_min)

def _rasterise_footprint(

    corners: np.ndarray,

    cell_size: float,

    x_min: float,

    y_min: float,

    grid_shape: Tuple[int, int],

) -> Set[Tuple[int, int]]:

    """Return the set of grid cells covered by the quadrilateral *corners*."""

    rows, cols = grid_shape

    gc = ((corners - [x_min, y_min]) / cell_size).astype(np.int32)

    mask = np.zeros((rows, cols), dtype=np.uint8)

    cv2.fillConvexPoly(mask, gc, 1)

    cells = set(zip(*np.where(mask > 0)))

    return cells

def select_tiling_images(

    records: List[ImageRecord],

    config: TilingConfig,

) -> List[ImageRecord]:

    """

    Greedy weighted set-cover: iteratively pick the image that covers the

    most *uncovered* grid cells until the entire survey area is covered.

    Reference

    ---------

    Chvatal, V. (1979). "A Greedy Heuristic for the Set-Covering Problem."

    Mathematics of Operations Research, 4(3), 233-235.

    """

    grid, cell_size, x_min, y_min = _build_coverage_grid(

        records, config.grid_resolution

    )

    grid_shape = grid.shape

    universe: Set[Tuple[int, int]] = set()

    image_cells: Dict[int, Set[Tuple[int, int]]] = {}

    for rec in records:

        fp = _image_footprint(rec)

        if fp is None:

            continue

        cells = _rasterise_footprint(fp, cell_size, x_min, y_min, grid_shape)

        image_cells[rec.index] = cells

        universe |= cells

    logger.info("Survey area: %d grid cells to cover.", len(universe))

    uncovered = set(universe)

    selected_indices: List[int] = []

    used: Set[int] = set()

    while uncovered:

        best_idx = -1

        best_gain = 0

        for idx, cells in image_cells.items():

            if idx in used:

                continue

            gain = len(cells & uncovered)

            if gain > best_gain:

                best_gain = gain

                best_idx = idx

        if best_idx == -1 or best_gain == 0:

            logger.warning(

                "Cannot cover %d remaining cells - possible registration gaps.",

                len(uncovered),

            )

            break

        selected_indices.append(best_idx)

        used.add(best_idx)

        uncovered -= image_cells[best_idx]

        logger.debug(

            " selected %s - covers %d new cells, %d remaining",

            records[best_idx].filename, best_gain, len(uncovered),

        )

    logger.info("Selected %d / %d images for full tiling.",

                len(selected_indices), len(records))

    selected = [records[i] for i in selected_indices]

    return selected

# ---------------------------------------------------------------------------

# 4. (Optional) Sort selected images - lower-left origin, clockwise

# ---------------------------------------------------------------------------

def sort_clockwise_from_lower_left(

    selected: List[ImageRecord],

) -> List[ImageRecord]:

    """

    Sort images starting from the lower-left corner and proceeding clockwise

    along the convex-hull perimeter.

    """

    if len(selected) <= 1:

        return selected

    centres = np.array([[r.cx, r.cy] for r in selected])

    centroid = centres.mean(axis=0)

    dx = centres[:, 0] - centroid[0]

    dy = centres[:, 1] - centroid[1]

    angles = np.arctan2(dx, dy)

    order = np.argsort(angles)

    lower_left_idx = int(

        np.argmin(centres[:, 0] - centres[:, 1])

    )

    start = int(np.where(order == lower_left_idx)[0][0])

    order = np.roll(order, -start)

    return [selected[i] for i in order]

# ---------------------------------------------------------------------------

# 5. Main pipeline

# ---------------------------------------------------------------------------

def load_image_records(input_dir: str) -> List[ImageRecord]:

    """Load image file paths from a directory, sorted by name."""

    extensions = ("*.jpg", "*.jpeg", "*.png", "*.tif", "*.tiff", "*.bmp")

    paths: List[str] = []

    for ext in extensions:

        paths.extend(glob.glob(os.path.join(input_dir, ext)))

        paths.extend(glob.glob(os.path.join(input_dir, ext.upper())))

    paths = sorted(set(paths))

    if not paths:

        raise FileNotFoundError(f"No image files found in {input_dir}")

    records = [

        ImageRecord(index=i, filepath=p, filename=os.path.basename(p))

        for i, p in enumerate(paths)

    ]

    logger.info("Loaded %d image paths from %s", len(records), input_dir)

    return records

def select_tiling_from_directory(

    input_dir: str,

    config: Optional[TilingConfig] = None,

    sort_result: bool = True,

) -> List[str]:

    """

    End-to-end pipeline.

    Parameters

    ----------

    input_dir : str

        Directory containing contiguous drone images.

    config : TilingConfig, optional

        Pipeline configuration; uses defaults when None.

    sort_result : bool

        If True, sort selected images clockwise from lower-left.

    Returns

    -------

    list[str]

        File paths of the selected tiling images.

    """

    if config is None:

        config = TilingConfig()

    records = load_image_records(input_dir)

    matcher = FeatureMatcher(config)

    matches = matcher.match_sequence(records)

    if not matches:

        logger.warning("No pairwise matches found. Returning all images.")

        return [r.filepath for r in records]

    records = register_images(records, matches)

    selected = select_tiling_images(records, config)

    if sort_result:

        selected = sort_clockwise_from_lower_left(selected)

    return [rec.filepath for rec in selected]

# ---------------------------------------------------------------------------

# CLI

# ---------------------------------------------------------------------------

def main():

    parser = argparse.ArgumentParser(

        description="Select a minimal set of drone images that tile the surveyed area."

    )

    parser.add_argument(

        "--input_dir", required=True,

        help="Directory containing contiguous aerial drone images.",

    )

    parser.add_argument(

        "--output", default=None,

        help="Optional JSON file to write the selected file list to.",

    )

    parser.add_argument(

        "--grid_resolution", type=int, default=100,

        help="Grid resolution for coverage computation (default: 100).",

    )

    parser.add_argument(

        "--work_scale", type=float, default=0.5,

        help="Down-scale factor for images during processing (default: 0.5).",

    )

    parser.add_argument(

        "--search_radius", type=int, default=5,

        help="Match each image against this many neighbours (default: 5).",

    )

    parser.add_argument(

        "--no_sort", action="store_true",

        help="Skip clockwise spatial sorting of the result.",

    )

    args = parser.parse_args()

    config = TilingConfig(

        grid_resolution=args.grid_resolution,

        work_scale=args.work_scale,

    )

    selected = select_tiling_from_directory(

        args.input_dir, config, sort_result=not args.no_sort

    )

    print(f"\n{'='*60}")

    print(f"Selected {len(selected)} images for tiling:")

    print(f"{'='*60}")

    for i, path in enumerate(selected, 1):

        print(f" {i:>3d}. {os.path.basename(path)}")

    if args.output:

        with open(args.output, "w") as f:

            json.dump({"selected_images": selected}, f, indent=2)

        print(f"\nResults written to {args.output}")

if __name__ == "__main__":

    main()

Tiling of aerial drone survey area

Problem statement:

Most UAV with top-down camera provide a video of their tour regardless of their flight path. This video can be split into several contiguous aerial drone images which may often number in hundreds even for a short duration. We need a software implementation that can take these images as input in the sorted order of the timeline and select those images that can complete tiling of the area surveyed by drone tour. The output of the implementation must be a selection of the original input. Flight path could be assumed to be rectangular with the lower left as origin for simplicity but none of the images have gps information available. Additionally, the selection could be output in the sorted order to start from lower left of the area surveyed and move clockwise along the perimeter but this can be skipped from the implementation.

Solution:

The following implementation uses computer vision (feature matching + homography estimation) to determine spatial relationships between images and then solves a set-cover problem to select a minimal tiling subset. A set of five stages completes the pipeline towards the goal:

Pipeline:

Stage What happens Key algorithm

1. Feature matching ORB keypoints are detected in each image; nearby frames (within search_radius) are matched using brute-force Hamming + Lowe's ratio test, then a homography is estimated with RANSAC. ORB [Rublee 2011], RANSAC [Fischler 1981]

2. Global registration A BFS traversal from the most-connected image propagates homographies so every frame is placed in a single mosaic coordinate system. No GPS needed. Brown & Lowe (2007) panoramic stitching

3. Coverage gridding The mosaic bounding box is discretised into a grid. Each image's warped footprint is rasterised to determine which cells it covers. OpenCV fillConvexPoly

4. Greedy set cover Iteratively selects the image covering the most uncovered cells until 100% coverage is reached. This is the classic greedy approximation (ln n + 1 factor). Chvatal (1979)

5. Spatial sort Selected images are sorted clockwise starting from the lower-left corner using angular ordering around the centroid. Convex-hull sweep

Usage:

# Install dependencies

pip install opencv-python-headless numpy

# Run (CLI)

python drone_tiling.py --input_dir ./my_drone_frames --output tiles.json

# Run (programmatic)

from drone_tiling import select_tiling_from_directory

selected = select_tiling_from_directory("./my_drone_frames")

print(selected)

Implementation:

"""

Drone Image Tiling Selector

============================

Selects a minimal subset of contiguous aerial drone images that completely

tiles (covers) the area surveyed during a drone flight.

Algorithm & Citations

---------------------

1. **Feature Detection & Matching**: ORB (Oriented FAST and Rotated BRIEF)

   detector with brute-force Hamming-distance matching.

   - Rublee, E., Rabaud, V., Konolige, K., & Bradski, G. (2011).

     "ORB: An efficient alternative to SIFT or SURF."

     IEEE International Conference on Computer Vision (ICCV), pp. 2564-2571.

     DOI: 10.1109/ICCV.2011.6126544

   - OpenCV documentation: https://docs.opencv.org/4.x/d1/d89/tutorial_py_orb.html

2. **Homography Estimation (RANSAC)**: Used to compute the projective

   transformation between overlapping image pairs, which gives us the

   relative spatial position of each image.

   - Fischler, M. A., & Bolles, R. C. (1981).

     "Random Sample Consensus: A Paradigm for Model Fitting with

     Applications to Image Analysis and Automated Cartography."

     Communications of the ACM, 24(6), 381-395.

   - OpenCV documentation: https://docs.opencv.org/4.x/d9/dab/tutorial_homography.html

3. **Image Stitching / Registration Pipeline**: The pairwise registration

   approach follows the methodology in:

   - Brown, M., & Lowe, D. G. (2007).

     "Automatic Panoramic Image Stitching using Invariant Features."

     International Journal of Computer Vision, 74(1), 59-73.

     DOI: 10.1007/s11263-006-0002-3

4. **Greedy Weighted Set Cover** for selecting the minimal tiling subset:

   - Chvatal, V. (1979).

     "A Greedy Heuristic for the Set-Covering Problem."

     Mathematics of Operations Research, 4(3), 233-235.

   - Vazirani, V. V. (2001). "Approximation Algorithms", Chapter 2.

     Springer-Verlag. ISBN: 3-540-65367-8.

Dependencies

------------

    pip install opencv-python-headless numpy

Usage

-----

    python drone_tiling.py --input_dir ./drone_images --output tiles.json

"""

import os

import glob

import json

import argparse

import logging

from dataclasses import dataclass, field

from typing import List, Tuple, Dict, Optional, Set

import cv2

import numpy as np

logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")

logger = logging.getLogger(__name__)

# ---------------------------------------------------------------------------

# Configuration

# ---------------------------------------------------------------------------

@dataclass

class TilingConfig:

    """Tuneable parameters for the tiling pipeline."""

    # ORB feature detector

    orb_n_features: int = 3000

    orb_scale_factor: float = 1.2

    orb_n_levels: int = 8

    # Feature matching

    match_ratio_thresh: float = 0.75 # Lowe's ratio test threshold

    min_good_matches: int = 30 # minimum inlier matches to

                                              # consider a pair overlapping

    # RANSAC homography

    ransac_reproj_thresh: float = 5.0 # reprojection error in pixels

    # Coverage grid resolution (number of cells along the longer axis)

    grid_resolution: int = 100

    # Overlap: minimum fraction of an image that must overlap with the

    # already-covered area for the image to be considered redundant.

    redundancy_overlap: float = 0.95

    # Image scaling for speed (process at this fraction of original size)

    work_scale: float = 0.5

# ---------------------------------------------------------------------------

# Data structures

# ---------------------------------------------------------------------------

@dataclass

class ImageRecord:

    """Metadata for a single drone image."""

    index: int

    filepath: str

    filename: str

    width: int = 0

    height: int = 0

    # Position of the image centre in the *mosaic* coordinate frame.

    cx: float = 0.0

    cy: float = 0.0

    # 3x3 homography that maps this image into mosaic coordinates.

    H: Optional[np.ndarray] = None

@dataclass

class PairwiseMatch:

    """Result of matching two images."""

    idx_a: int

    idx_b: int

    n_inliers: int

    H_ab: np.ndarray # homography from image A to image B

    confidence: float = 0.0

# ---------------------------------------------------------------------------

# 1. Feature detection & pair-wise matching

# ---------------------------------------------------------------------------

class FeatureMatcher:

    """

    Detects ORB features in every image and matches consecutive /

    nearby frames to estimate pairwise homographies.

    Reference

    ---------

    Rublee et al., "ORB: An efficient alternative to SIFT or SURF", ICCV 2011.

    OpenCV ORB docs: https://docs.opencv.org/4.x/d1/d89/tutorial_py_orb.html

    """

    def __init__(self, config: TilingConfig):

        self.cfg = config

        self.orb = cv2.ORB_create(

            nfeatures=config.orb_n_features,

            scaleFactor=config.orb_scale_factor,

            nlevels=config.orb_n_levels,

        )

        self.bf = cv2.BFMatcher(cv2.NORM_HAMMING)

    @staticmethod

    def _load_grey(path: str, scale: float) -> np.ndarray:

        img = cv2.imread(path, cv2.IMREAD_GRAYSCALE)

        if img is None:

            raise FileNotFoundError(f"Cannot read image: {path}")

        if scale != 1.0:

            img = cv2.resize(img, None, fx=scale, fy=scale,

                             interpolation=cv2.INTER_AREA)

        return img

    def _detect(self, grey: np.ndarray):

        kp, des = self.orb.detectAndCompute(grey, None)

        return kp, des

    def _match_pair(self, des_a, des_b, kp_a, kp_b) -> Optional[PairwiseMatch]:

        """

        Match descriptors with Lowe's ratio test, then estimate a homography

        with RANSAC.

        Reference

        ---------

        Lowe, D. G. (2004). "Distinctive Image Features from Scale-Invariant

        Keypoints." IJCV, 60(2), 91-110.

        Fischler & Bolles (1981). "Random Sample Consensus." CACM.

        """

        if des_a is None or des_b is None:

            return None

        if len(des_a) < 2 or len(des_b) < 2:

            return None

        raw_matches = self.bf.knnMatch(des_a, des_b, k=2)

        good = []

        for m_pair in raw_matches:

            if len(m_pair) < 2:

                continue

            m, n = m_pair

            if m.distance < self.cfg.match_ratio_thresh * n.distance:

                good.append(m)

        if len(good) < self.cfg.min_good_matches:

            return None

        pts_a = np.float32([kp_a[m.queryIdx].pt for m in good]).reshape(-1, 1, 2)

        pts_b = np.float32([kp_b[m.trainIdx].pt for m in good]).reshape(-1, 1, 2)

        H, mask = cv2.findHomography(

            pts_a, pts_b, cv2.RANSAC, self.cfg.ransac_reproj_thresh

        )

        if H is None:

            return None

        n_inliers = int(mask.sum())

        if n_inliers < self.cfg.min_good_matches:

            return None

        confidence = n_inliers / (8.0 + 0.3 * len(good)) # Brown & Lowe 2007

        return PairwiseMatch(

            idx_a=-1, idx_b=-1,

            n_inliers=n_inliers,

            H_ab=H,

            confidence=confidence,

        )

    def match_sequence(

        self,

        records: List[ImageRecord],

        search_radius: int = 5,

    ) -> List[PairwiseMatch]:

        """

        For every image, match against up to *search_radius* neighbours in

        both directions (handles forward overlap AND lateral overlap from

        adjacent flight strips).

        """

        n = len(records)

        scale = self.cfg.work_scale

        features = []

        for rec in records:

            grey = self._load_grey(rec.filepath, scale)

            rec.height, rec.width = grey.shape[:2]

            kp, des = self._detect(grey)

            features.append((kp, des))

            logger.debug(" %s - %d keypoints", rec.filename, len(kp))

        matches: List[PairwiseMatch] = []

        tested: Set[Tuple[int, int]] = set()

        for i in range(n):

            for j in range(i + 1, min(i + search_radius + 1, n)):

                if (i, j) in tested:

                    continue

                tested.add((i, j))

                kp_a, des_a = features[i]

                kp_b, des_b = features[j]

                pm = self._match_pair(des_a, des_b, kp_a, kp_b)

                if pm is not None:

                    pm.idx_a = i

                    pm.idx_b = j

                    matches.append(pm)

                    logger.debug(

                        " match %s <-> %s inliers=%d conf=%.3f",

                        records[i].filename, records[j].filename,

                        pm.n_inliers, pm.confidence,

                    )

        logger.info("Pairwise matching complete: %d valid pairs from %d images.",

                     len(matches), n)

        return matches

# ---------------------------------------------------------------------------

# 2. Global registration - place every image in a common mosaic frame

# ---------------------------------------------------------------------------

def register_images(

    records: List[ImageRecord],

    matches: List[PairwiseMatch],

) -> List[ImageRecord]:

    """

    Build a connected graph of images and propagate homographies via BFS

    so that every image is mapped into the coordinate frame of a single

    reference image (the one with the most connections).

    Reference

    ---------

    Brown, M. & Lowe, D. G. (2007). "Automatic Panoramic Image Stitching

    using Invariant Features." IJCV 74(1), 59-73.

    OpenCV Stitching: https://docs.opencv.org/4.x/d9/dab/tutorial_homography.html

    """

    n = len(records)

    adj: Dict[int, List[Tuple[int, np.ndarray, float]]] = {i: [] for i in range(n)}

    for pm in matches:

        a, b = pm.idx_a, pm.idx_b

        adj[a].append((b, pm.H_ab, pm.confidence))

        H_inv = np.linalg.inv(pm.H_ab)

        adj[b].append((a, H_inv, pm.confidence))

    ref = max(range(n), key=lambda i: len(adj[i]))

    logger.info("Reference image: %s (index %d, %d neighbours)",

                records[ref].filename, ref, len(adj[ref]))

    records[ref].H = np.eye(3, dtype=np.float64)

    visited = {ref}

    queue = [ref]

    while queue:

        cur = queue.pop(0)

        for nb, H_cur_to_nb, _conf in adj[cur]:

            if nb in visited:

                continue

            H_nb_to_mosaic = records[cur].H @ np.linalg.inv(H_cur_to_nb)

            records[nb].H = H_nb_to_mosaic

            visited.add(nb)

            queue.append(nb)

    disconnected = [i for i in range(n) if records[i].H is None]

    if disconnected:

        logger.warning(

            "%d images could not be registered (disconnected): %s",

            len(disconnected),

            [records[i].filename for i in disconnected],

        )

    for rec in records:

        if rec.H is None:

            continue

        centre_local = np.array([[rec.width / 2, rec.height / 2]], dtype=np.float64)

        centre_mosaic = cv2.perspectiveTransform(

            centre_local.reshape(-1, 1, 2), rec.H

        )

        rec.cx, rec.cy = centre_mosaic[0, 0]

    return records

# ---------------------------------------------------------------------------

# 3. Coverage grid & greedy set-cover selection

# ---------------------------------------------------------------------------

def _image_footprint(rec: ImageRecord) -> Optional[np.ndarray]:

    """Return the four mosaic-frame corners of *rec* as an (4,2) array."""

    if rec.H is None:

        return None

    corners = np.float64([

        [0, 0],

        [rec.width, 0],

        [rec.width, rec.height],

        [0, rec.height],

    ]).reshape(-1, 1, 2)

    warped = cv2.perspectiveTransform(corners, rec.H)

    return warped.reshape(-1, 2)

def _build_coverage_grid(

    records: List[ImageRecord],

    resolution: int,

) -> Tuple[np.ndarray, float, float, float]:

    """Create a boolean grid representing the total survey area."""

    all_corners = []

    for rec in records:

        fp = _image_footprint(rec)

        if fp is not None:

            all_corners.append(fp)

    if not all_corners:

        raise RuntimeError("No images could be registered - cannot build grid.")

    all_pts = np.vstack(all_corners)

    x_min, y_min = all_pts.min(axis=0)

    x_max, y_max = all_pts.max(axis=0)

    span = max(x_max - x_min, y_max - y_min)

    cell_size = span / resolution

    cols = int(np.ceil((x_max - x_min) / cell_size)) + 1

    rows = int(np.ceil((y_max - y_min) / cell_size)) + 1

    grid = np.zeros((rows, cols), dtype=bool)

    return grid, cell_size, float(x_min), float(y_min)

def _rasterise_footprint(

    corners: np.ndarray,

    cell_size: float,

    x_min: float,

    y_min: float,

    grid_shape: Tuple[int, int],

) -> Set[Tuple[int, int]]:

    """Return the set of grid cells covered by the quadrilateral *corners*."""

    rows, cols = grid_shape

    gc = ((corners - [x_min, y_min]) / cell_size).astype(np.int32)

    mask = np.zeros((rows, cols), dtype=np.uint8)

    cv2.fillConvexPoly(mask, gc, 1)

    cells = set(zip(*np.where(mask > 0)))

    return cells

def select_tiling_images(

    records: List[ImageRecord],

    config: TilingConfig,

) -> List[ImageRecord]:

    """

    Greedy weighted set-cover: iteratively pick the image that covers the

    most *uncovered* grid cells until the entire survey area is covered.

    Reference

    ---------

    Chvatal, V. (1979). "A Greedy Heuristic for the Set-Covering Problem."

    Mathematics of Operations Research, 4(3), 233-235.

    """

    grid, cell_size, x_min, y_min = _build_coverage_grid(

        records, config.grid_resolution

    )

    grid_shape = grid.shape

    universe: Set[Tuple[int, int]] = set()

    image_cells: Dict[int, Set[Tuple[int, int]]] = {}

    for rec in records:

        fp = _image_footprint(rec)

        if fp is None:

            continue

        cells = _rasterise_footprint(fp, cell_size, x_min, y_min, grid_shape)

        image_cells[rec.index] = cells

        universe |= cells

    logger.info("Survey area: %d grid cells to cover.", len(universe))

    uncovered = set(universe)

    selected_indices: List[int] = []

    used: Set[int] = set()

    while uncovered:

        best_idx = -1

        best_gain = 0

        for idx, cells in image_cells.items():

            if idx in used:

                continue

            gain = len(cells & uncovered)

            if gain > best_gain:

                best_gain = gain

                best_idx = idx

        if best_idx == -1 or best_gain == 0:

            logger.warning(

                "Cannot cover %d remaining cells - possible registration gaps.",

                len(uncovered),

            )

            break

        selected_indices.append(best_idx)

        used.add(best_idx)

        uncovered -= image_cells[best_idx]

        logger.debug(

            " selected %s - covers %d new cells, %d remaining",

            records[best_idx].filename, best_gain, len(uncovered),

        )

    logger.info("Selected %d / %d images for full tiling.",

                len(selected_indices), len(records))

    selected = [records[i] for i in selected_indices]

    return selected

# ---------------------------------------------------------------------------

# 4. (Optional) Sort selected images - lower-left origin, clockwise

# ---------------------------------------------------------------------------

def sort_clockwise_from_lower_left(

    selected: List[ImageRecord],

) -> List[ImageRecord]:

    """

    Sort images starting from the lower-left corner and proceeding clockwise

    along the convex-hull perimeter.

    """

    if len(selected) <= 1:

        return selected

    centres = np.array([[r.cx, r.cy] for r in selected])

    centroid = centres.mean(axis=0)

    dx = centres[:, 0] - centroid[0]

    dy = centres[:, 1] - centroid[1]

    angles = np.arctan2(dx, dy)

    order = np.argsort(angles)

    lower_left_idx = int(

        np.argmin(centres[:, 0] - centres[:, 1])

    )

    start = int(np.where(order == lower_left_idx)[0][0])

    order = np.roll(order, -start)

    return [selected[i] for i in order]

# ---------------------------------------------------------------------------

# 5. Main pipeline

# ---------------------------------------------------------------------------

def load_image_records(input_dir: str) -> List[ImageRecord]:

    """Load image file paths from a directory, sorted by name."""

    extensions = ("*.jpg", "*.jpeg", "*.png", "*.tif", "*.tiff", "*.bmp")

    paths: List[str] = []

    for ext in extensions:

        paths.extend(glob.glob(os.path.join(input_dir, ext)))

        paths.extend(glob.glob(os.path.join(input_dir, ext.upper())))

    paths = sorted(set(paths))

    if not paths:

        raise FileNotFoundError(f"No image files found in {input_dir}")

    records = [

        ImageRecord(index=i, filepath=p, filename=os.path.basename(p))

        for i, p in enumerate(paths)

    ]

    logger.info("Loaded %d image paths from %s", len(records), input_dir)

    return records

def select_tiling_from_directory(

    input_dir: str,

    config: Optional[TilingConfig] = None,

    sort_result: bool = True,

) -> List[str]:

    """

    End-to-end pipeline.

    Parameters

    ----------

    input_dir : str

        Directory containing contiguous drone images.

    config : TilingConfig, optional

        Pipeline configuration; uses defaults when None.

    sort_result : bool

        If True, sort selected images clockwise from lower-left.

    Returns

    -------

    list[str]

        File paths of the selected tiling images.

    """

    if config is None:

        config = TilingConfig()

    records = load_image_records(input_dir)

    matcher = FeatureMatcher(config)

    matches = matcher.match_sequence(records)

    if not matches:

        logger.warning("No pairwise matches found. Returning all images.")

        return [r.filepath for r in records]

    records = register_images(records, matches)

    selected = select_tiling_images(records, config)

    if sort_result:

        selected = sort_clockwise_from_lower_left(selected)

    return [rec.filepath for rec in selected]

# ---------------------------------------------------------------------------

# CLI

# ---------------------------------------------------------------------------

def main():

    parser = argparse.ArgumentParser(

        description="Select a minimal set of drone images that tile the surveyed area."

    )

    parser.add_argument(

        "--input_dir", required=True,

        help="Directory containing contiguous aerial drone images.",

    )

    parser.add_argument(

        "--output", default=None,

        help="Optional JSON file to write the selected file list to.",

    )

    parser.add_argument(

        "--grid_resolution", type=int, default=100,

        help="Grid resolution for coverage computation (default: 100).",

    )

    parser.add_argument(

        "--work_scale", type=float, default=0.5,

        help="Down-scale factor for images during processing (default: 0.5).",

    )

    parser.add_argument(

        "--search_radius", type=int, default=5,

        help="Match each image against this many neighbours (default: 5).",

    )

    parser.add_argument(

        "--no_sort", action="store_true",

        help="Skip clockwise spatial sorting of the result.",

    )

    args = parser.parse_args()

    config = TilingConfig(

        grid_resolution=args.grid_resolution,

        work_scale=args.work_scale,

    )

    selected = select_tiling_from_directory(

        args.input_dir, config, sort_result=not args.no_sort

    )

    print(f"\n{'='*60}")

    print(f"Selected {len(selected)} images for tiling:")

    print(f"{'='*60}")

    for i, path in enumerate(selected, 1):

        print(f" {i:>3d}. {os.path.basename(path)}")

    if args.output:

        with open(args.output, "w") as f:

            json.dump({"selected_images": selected}, f, indent=2)

        print(f"\nResults written to {args.output}")

if __name__ == "__main__":

    main()

Tiling of aerial drone survey area

Problem statement:

Most UAV with top-down camera provide a video of their tour regardless of their flight path. This video can be split into several contiguous aerial drone images which may often number in hundreds even for a short duration. We need a software implementation that can take these images as input in the sorted order of the timeline and select those images that can complete tiling of the area surveyed by drone tour. The output of the implementation must be a selection of the original input. Flight path could be assumed to be rectangular with the lower left as origin for simplicity but none of the images have gps information available. Additionally, the selection could be output in the sorted order to start from lower left of the area surveyed and move clockwise along the perimeter but this can be skipped from the implementation.

Solution:

The following implementation uses computer vision (feature matching + homography estimation) to determine spatial relationships between images and then solves a set-cover problem to select a minimal tiling subset. A set of five stages completes the pipeline towards the goal:

Pipeline:

Stage What happens Key algorithm

1. Feature matching ORB keypoints are detected in each image; nearby frames (within search_radius) are matched using brute-force Hamming + Lowe's ratio test, then a homography is estimated with RANSAC. ORB [Rublee 2011], RANSAC [Fischler 1981]

2. Global registration A BFS traversal from the most-connected image propagates homographies so every frame is placed in a single mosaic coordinate system. No GPS needed. Brown & Lowe (2007) panoramic stitching

3. Coverage gridding The mosaic bounding box is discretised into a grid. Each image's warped footprint is rasterised to determine which cells it covers. OpenCV fillConvexPoly

4. Greedy set cover Iteratively selects the image covering the most uncovered cells until 100% coverage is reached. This is the classic greedy approximation (ln n + 1 factor). Chvatal (1979)

5. Spatial sort Selected images are sorted clockwise starting from the lower-left corner using angular ordering around the centroid. Convex-hull sweep

Usage:

# Install dependencies

pip install opencv-python-headless numpy

# Run (CLI)

python drone_tiling.py --input_dir ./my_drone_frames --output tiles.json

# Run (programmatic)

from drone_tiling import select_tiling_from_directory

selected = select_tiling_from_directory("./my_drone_frames")

print(selected)

Implementation:

"""

Drone Image Tiling Selector

============================

Selects a minimal subset of contiguous aerial drone images that completely

tiles (covers) the area surveyed during a drone flight.

Algorithm & Citations

---------------------

1. **Feature Detection & Matching**: ORB (Oriented FAST and Rotated BRIEF)

   detector with brute-force Hamming-distance matching.

   - Rublee, E., Rabaud, V., Konolige, K., & Bradski, G. (2011).

     "ORB: An efficient alternative to SIFT or SURF."

     IEEE International Conference on Computer Vision (ICCV), pp. 2564-2571.

     DOI: 10.1109/ICCV.2011.6126544

   - OpenCV documentation: https://docs.opencv.org/4.x/d1/d89/tutorial_py_orb.html

2. **Homography Estimation (RANSAC)**: Used to compute the projective

   transformation between overlapping image pairs, which gives us the

   relative spatial position of each image.

   - Fischler, M. A., & Bolles, R. C. (1981).

     "Random Sample Consensus: A Paradigm for Model Fitting with

     Applications to Image Analysis and Automated Cartography."

     Communications of the ACM, 24(6), 381-395.

   - OpenCV documentation: https://docs.opencv.org/4.x/d9/dab/tutorial_homography.html

3. **Image Stitching / Registration Pipeline**: The pairwise registration

   approach follows the methodology in:

   - Brown, M., & Lowe, D. G. (2007).

     "Automatic Panoramic Image Stitching using Invariant Features."

     International Journal of Computer Vision, 74(1), 59-73.

     DOI: 10.1007/s11263-006-0002-3

4. **Greedy Weighted Set Cover** for selecting the minimal tiling subset:

   - Chvatal, V. (1979).

     "A Greedy Heuristic for the Set-Covering Problem."

     Mathematics of Operations Research, 4(3), 233-235.

   - Vazirani, V. V. (2001). "Approximation Algorithms", Chapter 2.

     Springer-Verlag. ISBN: 3-540-65367-8.

Dependencies

------------

    pip install opencv-python-headless numpy

Usage

-----

    python drone_tiling.py --input_dir ./drone_images --output tiles.json

"""

import os

import glob

import json

import argparse

import logging

from dataclasses import dataclass, field

from typing import List, Tuple, Dict, Optional, Set

import cv2

import numpy as np

logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")

logger = logging.getLogger(__name__)

# ---------------------------------------------------------------------------

# Configuration

# ---------------------------------------------------------------------------

@dataclass

class TilingConfig:

    """Tuneable parameters for the tiling pipeline."""

    # ORB feature detector

    orb_n_features: int = 3000

    orb_scale_factor: float = 1.2

    orb_n_levels: int = 8

    # Feature matching

    match_ratio_thresh: float = 0.75 # Lowe's ratio test threshold

    min_good_matches: int = 30 # minimum inlier matches to

                                              # consider a pair overlapping

    # RANSAC homography

    ransac_reproj_thresh: float = 5.0 # reprojection error in pixels

    # Coverage grid resolution (number of cells along the longer axis)

    grid_resolution: int = 100

    # Overlap: minimum fraction of an image that must overlap with the

    # already-covered area for the image to be considered redundant.

    redundancy_overlap: float = 0.95

    # Image scaling for speed (process at this fraction of original size)

    work_scale: float = 0.5

# ---------------------------------------------------------------------------

# Data structures

# ---------------------------------------------------------------------------

@dataclass

class ImageRecord:

    """Metadata for a single drone image."""

    index: int

    filepath: str

    filename: str

    width: int = 0

    height: int = 0

    # Position of the image centre in the *mosaic* coordinate frame.

    cx: float = 0.0

    cy: float = 0.0

    # 3x3 homography that maps this image into mosaic coordinates.

    H: Optional[np.ndarray] = None

@dataclass

class PairwiseMatch:

    """Result of matching two images."""

    idx_a: int

    idx_b: int

    n_inliers: int

    H_ab: np.ndarray # homography from image A to image B

    confidence: float = 0.0

# ---------------------------------------------------------------------------

# 1. Feature detection & pair-wise matching

# ---------------------------------------------------------------------------

class FeatureMatcher:

    """

    Detects ORB features in every image and matches consecutive /

    nearby frames to estimate pairwise homographies.

    Reference

    ---------

    Rublee et al., "ORB: An efficient alternative to SIFT or SURF", ICCV 2011.

    OpenCV ORB docs: https://docs.opencv.org/4.x/d1/d89/tutorial_py_orb.html

    """

    def __init__(self, config: TilingConfig):

        self.cfg = config

        self.orb = cv2.ORB_create(

            nfeatures=config.orb_n_features,

            scaleFactor=config.orb_scale_factor,

            nlevels=config.orb_n_levels,

        )

        self.bf = cv2.BFMatcher(cv2.NORM_HAMMING)

    @staticmethod

    def _load_grey(path: str, scale: float) -> np.ndarray:

        img = cv2.imread(path, cv2.IMREAD_GRAYSCALE)

        if img is None:

            raise FileNotFoundError(f"Cannot read image: {path}")

        if scale != 1.0:

            img = cv2.resize(img, None, fx=scale, fy=scale,

                             interpolation=cv2.INTER_AREA)

        return img

    def _detect(self, grey: np.ndarray):

        kp, des = self.orb.detectAndCompute(grey, None)

        return kp, des

    def _match_pair(self, des_a, des_b, kp_a, kp_b) -> Optional[PairwiseMatch]:

        """

        Match descriptors with Lowe's ratio test, then estimate a homography

        with RANSAC.

        Reference

        ---------

        Lowe, D. G. (2004). "Distinctive Image Features from Scale-Invariant

        Keypoints." IJCV, 60(2), 91-110.

        Fischler & Bolles (1981). "Random Sample Consensus." CACM.

        """

        if des_a is None or des_b is None:

            return None

        if len(des_a) < 2 or len(des_b) < 2:

            return None

        raw_matches = self.bf.knnMatch(des_a, des_b, k=2)

        good = []

        for m_pair in raw_matches:

            if len(m_pair) < 2:

                continue

            m, n = m_pair

            if m.distance < self.cfg.match_ratio_thresh * n.distance:

                good.append(m)

        if len(good) < self.cfg.min_good_matches:

            return None

        pts_a = np.float32([kp_a[m.queryIdx].pt for m in good]).reshape(-1, 1, 2)

        pts_b = np.float32([kp_b[m.trainIdx].pt for m in good]).reshape(-1, 1, 2)

        H, mask = cv2.findHomography(

            pts_a, pts_b, cv2.RANSAC, self.cfg.ransac_reproj_thresh

        )

        if H is None:

            return None

        n_inliers = int(mask.sum())

        if n_inliers < self.cfg.min_good_matches:

            return None

        confidence = n_inliers / (8.0 + 0.3 * len(good)) # Brown & Lowe 2007

        return PairwiseMatch(

            idx_a=-1, idx_b=-1,

            n_inliers=n_inliers,

            H_ab=H,

            confidence=confidence,

        )

    def match_sequence(

        self,

        records: List[ImageRecord],

        search_radius: int = 5,

    ) -> List[PairwiseMatch]:

        """

        For every image, match against up to *search_radius* neighbours in

        both directions (handles forward overlap AND lateral overlap from

        adjacent flight strips).

        """

        n = len(records)

        scale = self.cfg.work_scale

        features = []

        for rec in records:

            grey = self._load_grey(rec.filepath, scale)

            rec.height, rec.width = grey.shape[:2]

            kp, des = self._detect(grey)

            features.append((kp, des))

            logger.debug(" %s - %d keypoints", rec.filename, len(kp))

        matches: List[PairwiseMatch] = []

        tested: Set[Tuple[int, int]] = set()

        for i in range(n):

            for j in range(i + 1, min(i + search_radius + 1, n)):

                if (i, j) in tested:

                    continue

                tested.add((i, j))

                kp_a, des_a = features[i]

                kp_b, des_b = features[j]

                pm = self._match_pair(des_a, des_b, kp_a, kp_b)

                if pm is not None:

                    pm.idx_a = i

                    pm.idx_b = j

                    matches.append(pm)

                    logger.debug(

                        " match %s <-> %s inliers=%d conf=%.3f",

                        records[i].filename, records[j].filename,

                        pm.n_inliers, pm.confidence,

                    )

        logger.info("Pairwise matching complete: %d valid pairs from %d images.",

                     len(matches), n)

        return matches

# ---------------------------------------------------------------------------

# 2. Global registration - place every image in a common mosaic frame

# ---------------------------------------------------------------------------

def register_images(

    records: List[ImageRecord],

    matches: List[PairwiseMatch],

) -> List[ImageRecord]:

    """

    Build a connected graph of images and propagate homographies via BFS

    so that every image is mapped into the coordinate frame of a single

    reference image (the one with the most connections).

    Reference

    ---------

    Brown, M. & Lowe, D. G. (2007). "Automatic Panoramic Image Stitching

    using Invariant Features." IJCV 74(1), 59-73.

    OpenCV Stitching: https://docs.opencv.org/4.x/d9/dab/tutorial_homography.html

    """

    n = len(records)

    adj: Dict[int, List[Tuple[int, np.ndarray, float]]] = {i: [] for i in range(n)}

    for pm in matches:

        a, b = pm.idx_a, pm.idx_b

        adj[a].append((b, pm.H_ab, pm.confidence))

        H_inv = np.linalg.inv(pm.H_ab)

        adj[b].append((a, H_inv, pm.confidence))

    ref = max(range(n), key=lambda i: len(adj[i]))

    logger.info("Reference image: %s (index %d, %d neighbours)",

                records[ref].filename, ref, len(adj[ref]))

    records[ref].H = np.eye(3, dtype=np.float64)

    visited = {ref}

    queue = [ref]

    while queue:

        cur = queue.pop(0)

        for nb, H_cur_to_nb, _conf in adj[cur]:

            if nb in visited:

                continue

            H_nb_to_mosaic = records[cur].H @ np.linalg.inv(H_cur_to_nb)

            records[nb].H = H_nb_to_mosaic

            visited.add(nb)

            queue.append(nb)

    disconnected = [i for i in range(n) if records[i].H is None]

    if disconnected:

        logger.warning(

            "%d images could not be registered (disconnected): %s",

            len(disconnected),

            [records[i].filename for i in disconnected],

        )

    for rec in records:

        if rec.H is None:

            continue

        centre_local = np.array([[rec.width / 2, rec.height / 2]], dtype=np.float64)

        centre_mosaic = cv2.perspectiveTransform(

            centre_local.reshape(-1, 1, 2), rec.H

        )

        rec.cx, rec.cy = centre_mosaic[0, 0]

    return records

# ---------------------------------------------------------------------------

# 3. Coverage grid & greedy set-cover selection

# ---------------------------------------------------------------------------

def _image_footprint(rec: ImageRecord) -> Optional[np.ndarray]:

    """Return the four mosaic-frame corners of *rec* as an (4,2) array."""

    if rec.H is None:

        return None

    corners = np.float64([

        [0, 0],

        [rec.width, 0],

        [rec.width, rec.height],

        [0, rec.height],

    ]).reshape(-1, 1, 2)

    warped = cv2.perspectiveTransform(corners, rec.H)

    return warped.reshape(-1, 2)

def _build_coverage_grid(

    records: List[ImageRecord],

    resolution: int,

) -> Tuple[np.ndarray, float, float, float]:

    """Create a boolean grid representing the total survey area."""

    all_corners = []

    for rec in records:

        fp = _image_footprint(rec)

        if fp is not None:

            all_corners.append(fp)

    if not all_corners:

        raise RuntimeError("No images could be registered - cannot build grid.")

    all_pts = np.vstack(all_corners)

    x_min, y_min = all_pts.min(axis=0)

    x_max, y_max = all_pts.max(axis=0)

    span = max(x_max - x_min, y_max - y_min)

    cell_size = span / resolution

    cols = int(np.ceil((x_max - x_min) / cell_size)) + 1

    rows = int(np.ceil((y_max - y_min) / cell_size)) + 1

    grid = np.zeros((rows, cols), dtype=bool)

    return grid, cell_size, float(x_min), float(y_min)

def _rasterise_footprint(

    corners: np.ndarray,

    cell_size: float,

    x_min: float,

    y_min: float,

    grid_shape: Tuple[int, int],

) -> Set[Tuple[int, int]]:

    """Return the set of grid cells covered by the quadrilateral *corners*."""

    rows, cols = grid_shape

    gc = ((corners - [x_min, y_min]) / cell_size).astype(np.int32)

    mask = np.zeros((rows, cols), dtype=np.uint8)

    cv2.fillConvexPoly(mask, gc, 1)

    cells = set(zip(*np.where(mask > 0)))

    return cells

def select_tiling_images(

    records: List[ImageRecord],

    config: TilingConfig,

) -> List[ImageRecord]:

    """

    Greedy weighted set-cover: iteratively pick the image that covers the

    most *uncovered* grid cells until the entire survey area is covered.

    Reference

    ---------

    Chvatal, V. (1979). "A Greedy Heuristic for the Set-Covering Problem."

    Mathematics of Operations Research, 4(3), 233-235.

    """

    grid, cell_size, x_min, y_min = _build_coverage_grid(

        records, config.grid_resolution

    )

    grid_shape = grid.shape

    universe: Set[Tuple[int, int]] = set()

    image_cells: Dict[int, Set[Tuple[int, int]]] = {}

    for rec in records:

        fp = _image_footprint(rec)

        if fp is None:

            continue

        cells = _rasterise_footprint(fp, cell_size, x_min, y_min, grid_shape)

        image_cells[rec.index] = cells

        universe |= cells

    logger.info("Survey area: %d grid cells to cover.", len(universe))

    uncovered = set(universe)

    selected_indices: List[int] = []

    used: Set[int] = set()

    while uncovered:

        best_idx = -1

        best_gain = 0

        for idx, cells in image_cells.items():

            if idx in used:

                continue

            gain = len(cells & uncovered)

            if gain > best_gain:

                best_gain = gain

                best_idx = idx

        if best_idx == -1 or best_gain == 0:

            logger.warning(

                "Cannot cover %d remaining cells - possible registration gaps.",

                len(uncovered),

            )

            break

        selected_indices.append(best_idx)

        used.add(best_idx)

        uncovered -= image_cells[best_idx]

        logger.debug(

            " selected %s - covers %d new cells, %d remaining",

            records[best_idx].filename, best_gain, len(uncovered),

        )

    logger.info("Selected %d / %d images for full tiling.",

                len(selected_indices), len(records))

    selected = [records[i] for i in selected_indices]

    return selected

# ---------------------------------------------------------------------------

# 4. (Optional) Sort selected images - lower-left origin, clockwise

# ---------------------------------------------------------------------------

def sort_clockwise_from_lower_left(

    selected: List[ImageRecord],

) -> List[ImageRecord]:

    """

    Sort images starting from the lower-left corner and proceeding clockwise

    along the convex-hull perimeter.

    """

    if len(selected) <= 1:

        return selected

    centres = np.array([[r.cx, r.cy] for r in selected])

    centroid = centres.mean(axis=0)

    dx = centres[:, 0] - centroid[0]

    dy = centres[:, 1] - centroid[1]

    angles = np.arctan2(dx, dy)

    order = np.argsort(angles)

    lower_left_idx = int(

        np.argmin(centres[:, 0] - centres[:, 1])

    )

    start = int(np.where(order == lower_left_idx)[0][0])

    order = np.roll(order, -start)

    return [selected[i] for i in order]

# ---------------------------------------------------------------------------

# 5. Main pipeline

# ---------------------------------------------------------------------------

def load_image_records(input_dir: str) -> List[ImageRecord]:

    """Load image file paths from a directory, sorted by name."""

    extensions = ("*.jpg", "*.jpeg", "*.png", "*.tif", "*.tiff", "*.bmp")

    paths: List[str] = []

    for ext in extensions:

        paths.extend(glob.glob(os.path.join(input_dir, ext)))

        paths.extend(glob.glob(os.path.join(input_dir, ext.upper())))

    paths = sorted(set(paths))

    if not paths:

        raise FileNotFoundError(f"No image files found in {input_dir}")

    records = [

        ImageRecord(index=i, filepath=p, filename=os.path.basename(p))

        for i, p in enumerate(paths)

    ]

    logger.info("Loaded %d image paths from %s", len(records), input_dir)

    return records

def select_tiling_from_directory(

    input_dir: str,

    config: Optional[TilingConfig] = None,

    sort_result: bool = True,

) -> List[str]:

    """

    End-to-end pipeline.

    Parameters

    ----------

    input_dir : str

        Directory containing contiguous drone images.

    config : TilingConfig, optional

        Pipeline configuration; uses defaults when None.

    sort_result : bool

        If True, sort selected images clockwise from lower-left.

    Returns

    -------

    list[str]

        File paths of the selected tiling images.

    """

    if config is None:

        config = TilingConfig()

    records = load_image_records(input_dir)

    matcher = FeatureMatcher(config)

    matches = matcher.match_sequence(records)

    if not matches:

        logger.warning("No pairwise matches found. Returning all images.")

        return [r.filepath for r in records]

    records = register_images(records, matches)

    selected = select_tiling_images(records, config)

    if sort_result:

        selected = sort_clockwise_from_lower_left(selected)

    return [rec.filepath for rec in selected]

# ---------------------------------------------------------------------------

# CLI

# ---------------------------------------------------------------------------

def main():

    parser = argparse.ArgumentParser(

        description="Select a minimal set of drone images that tile the surveyed area."

    )

    parser.add_argument(

        "--input_dir", required=True,

        help="Directory containing contiguous aerial drone images.",

    )

    parser.add_argument(

        "--output", default=None,

        help="Optional JSON file to write the selected file list to.",

    )

    parser.add_argument(

        "--grid_resolution", type=int, default=100,

        help="Grid resolution for coverage computation (default: 100).",

    )

    parser.add_argument(

        "--work_scale", type=float, default=0.5,

        help="Down-scale factor for images during processing (default: 0.5).",

    )

    parser.add_argument(

        "--search_radius", type=int, default=5,

        help="Match each image against this many neighbours (default: 5).",

    )

    parser.add_argument(

        "--no_sort", action="store_true",

        help="Skip clockwise spatial sorting of the result.",

    )

    args = parser.parse_args()

    config = TilingConfig(

        grid_resolution=args.grid_resolution,

        work_scale=args.work_scale,

    )

    selected = select_tiling_from_directory(

        args.input_dir, config, sort_result=not args.no_sort

    )

    print(f"\n{'='*60}")

    print(f"Selected {len(selected)} images for tiling:")

    print(f"{'='*60}")

    for i, path in enumerate(selected, 1):

        print(f" {i:>3d}. {os.path.basename(path)}")

    if args.output:

        with open(args.output, "w") as f:

            json.dump({"selected_images": selected}, f, indent=2)

        print(f"\nResults written to {args.output}")

if __name__ == "__main__":

    main()


Alternatives: IaCResolutionsPart565.docx