Monday, November 3, 2025

 Transient and transit object detection in aerial drone images: 

Introduction: 

The increasing availability of high-resolution aerial imagery from unmanned aerial vehicles (UAVs) presents a unique opportunity for time-domain object detection. Unlike traditional satellite imagery, UAVs offer flexible sampling rates, dynamic perspectives, and real-time responsiveness. However, the irregular cadence and noise inherent in aerial sequences pose challenges for conventional object detection pipelines, especially when attempting to identify transient or fleeting objects such as pedestrians, vehicles, or small mobile assets. 

Machine learning techniques have become indispensable in aerial image analysis, particularly in large datasets where manual annotation is infeasible. Convolutional neural networks (CNNs) have been widely adopted for static object detection, but their performance degrades when applied to temporally sparse or noisy sequences. Prior work has explored phase-folding and frame-by-frame tracking, but these methods are computationally expensive and sensitive to sampling irregularities. 

This paper introduces DroneWorldNet, a frequency-domain model that bypasses the limitations of traditional tracking by transforming image clip vectors into frequency-domain tensors. DroneWorldNet applies discrete wavelet transform (DWT) to suppress noise and highlight localized changes, followed by FEFT to extract periodic and harmonic features across time. These tensors are then classified into one of four object states: null (no object), transient (brief appearance), stable (persistent presence), or transit (periodic occlusion or movement). 

We apply DroneWorldNet to the DOTA dataset, which contains annotated aerial scenes from diverse environments. Each image clip is treated as a temporal stack, and the model is trained on both real and synthetic sequences to ensure robustness across lighting, altitude, and occlusion conditions. The pipeline includes spatial clustering, data normalization, and tensor construction, followed by classification using CNN and fully connected layers. 

DroneWorldNet achieves subsecond inference latency and high classification accuracy, demonstrating its suitability for real-time deployment in edge-cloud UAV systems. This work lays the foundation for a full-scale variability survey of aerial scenes and opens new avenues for time-domain analysis in geospatial workflows. 

Data Preprocessing: 

Each image clip sequence is preprocessed to construct a high-quality, neural-network-friendly representation. For each frame, we extract three features: normalized brightness, estimated uncertainty (e.g., motion blur or sensor noise), and timestamp. Brightness values are converted from log scale to linear flux using calibration constants derived from the UAV sensor specifications. We then subtract the median and standardize using the interquartile range (IQR), followed by compression into the [-1, 1] range using the arcsinh function. 

Time values are normalized to [0, 1] based on the total observation window, typically spanning 10–30 seconds. Uncertainty values are similarly rescaled and compressed to match the flux scale. The final input tensor for each sequence is a matrix of shape T × 3, where T is the number of frames, and each row contains brightness, uncertainty, and timestamp. 

This representation ensures that DroneWorldNet can handle sequences of varying length and sampling rate, a critical requirement for aerial deployments where cadence may fluctuate due to flight path, altitude, or environmental conditions. 

DroneWorldNet model: 

DroneWorldNet is a hybrid signal-processing and deep learning model designed to classify aerial image sequences into four object states: null, transient, stable, and transit. The model architecture integrates three core components: 

Wavelet Decomposition: A one-dimensional discrete wavelet transform (DWT) is applied to the brightness vector to suppress noise and highlight localized changes. This is particularly effective in identifying transient objects that appear briefly and then vanish. 

Finite-Embedding Fourier Transform (FEFT): A modified discrete Fourier transform is applied to the time series to extract periodic and harmonic features. FEFT enables detection of transit-like behavior, such as vehicles passing through occluded regions or pedestrians crossing paths. 

Convolutional Neural Network (CNN): The frequency-domain tensor is passed through a series of convolutional and fully connected layers, which learn to discriminate between the four object states. The model is trained using a categorical cross-entropy loss function and optimized with Adam. 

 

Training and Evaluation: 

To train DroneWorldNet, we generate synthetic aerial sequences using motion simulation models that replicate pedestrian and vehicle dynamics under varying conditions. These include changes in lighting, altitude, occlusion, and background texture. Synthetic sequences are blended with real samples from the DOTA dataset to ensure generalization across diverse environments. 

The model is trained on a four-class scheme: null (no object), transient (brief appearance), stable (persistent presence), and transit (periodic occlusion or movement). On a held-out validation set, DroneWorldNet achieves an F1 score of 0.89, with precision and recall exceeding 0.90 for stable and transit classes. Transient detection remains challenging due to low signal-to-noise ratio, but wavelet decomposition significantly improves sensitivity. 


#codingexercise: CodingExercise-11-03-2025.docx

No comments:

Post a Comment