Transient and Transit objects in aerial drone scene sequences
Time, Location and frequency are the dimensions we would like to ideally capture for each object we detect in aerial drone image scene and their sequences but objects don’t have a generalized signature and often require training and deep learning supervision to detect them, especially for transient and transit objects such as pedestrians and vehicles. Given several pedestrians and vehicles in scene sequences, we apply Density Based Clustering of Applications with Noise to work robustly on clusters with different shapes, without requiring the number of clusters and especially easy to filter out noise.
Each image clip sequence is preprocessed to construct a high-quality, neural-network-friendly representation. For each frame, we extract three features: normalized spectral signatures, estimated uncertainty (e.g., motion blur or sensor noise), and timestamp. Spectral signature values are converted from log scale to linear flux using calibration constants derived from the UAV sensor specifications. We then subtract the median and standardize using the interquartile range (IQR), followed by compression into the [-1, 1] range using the arcsinh function.
Time values are normalized to [0, 1] based on the total observation window, typically spanning 10–30 seconds. Uncertainty values are similarly rescaled and compressed to match the flux scale. The final input tensor for each sequence is a matrix of shape T × 3, where T is the number of frames, and each row contains spectral signature, uncertainty, and timestamp.
This representation ensures any model can handle sequences of varying length and sampling rate, a critical requirement for aerial deployments where cadence may fluctuate due to flight path, altitude, or environmental conditions.
The model that we use to leverage frequency domain has three core components:
Wavelet Decomposition: A one-dimensional discrete wavelet transform (DWT) is applied to the spectral signature vector to suppress noise and highlight localized changes. This is particularly effective in identifying transient objects that appear briefly and then vanish.
Finite-Embedding Fourier Transform (FEFT): A modified discrete Fourier transform is applied to the time series to extract periodic and harmonic features. FEFT enables detection of transit-like behavior, such as vehicles passing through occluded regions or pedestrians crossing paths.
Convolutional Neural Network (CNN): The frequency-domain tensor is passed through a series of convolutional and fully connected layers, which learn to discriminate between the four object states. The model is trained using a categorical cross-entropy loss function and optimized with Adam.
Each entry of the output vector v is called a logit and represents the predicted likelihood that the star is of each class (we use v0 = null, v1 = transient, v2 = pulsator, v3 = transit). We compare the output vectors to one-hot target vectors t with the formula:
And
We apply the DFT on an N-long signal x(n) as follows:
To extract features from variable length sequences, we introduce a vector u from writing the result of the DFT as a product of u ⮾ v with u being a parameter to the model and its dimension as a hyperparameter named samples. When
up to the length of the input vector.
Then we initialize the finite-embedding Fourier Transform as
The outer product and element wise exponentiation of the matrix is fast operations.
The DWT comprises of two operations: a high pass filtering and a low pass filtering. With wavelet decomposition applied to the result, we get two wavelets: 1 representing downscaled and smoothed version of a function and the other as 2. the variations and they are taken as bi-orthogonal.
Samples from the DOTA dataset are taken to ensure generalization across diverse environments. The model is trained on a four-class scheme: null (no object), transient (brief appearance), stable (persistent presence), and transit (periodic occlusion or movement).
#codingexercise: CodingExercise-11-04-2025.docx
No comments:
Post a Comment