Friday, October 17, 2025

 Cloud-Edge Synergy in Autonomous and Manual Driving

As vehicles evolve into intelligent, sensor-rich platforms, the computational demands of autonomous and assisted driving have surged and are proving to be predecessors to drone video sensing platforms such as the one we have been discussing1. Processing high-resolution video streams, lidar data, and real-time decision-making tasks onboard is increasingly impractical—especially under latency and energy constraints. This has catalyzed a paradigm shift toward hybrid architectures that distribute workloads across edge devices and public cloud infrastructure.

The Rise of Vehicular Edge-Cloud Computing

Recent research highlights the emergence of Vehicular Edge Computing (VEC) as a bridge between onboard systems and cloud platforms. Vehicles now operate within dynamic networks that include roadside units, mobile edge nodes, and cloud data centers. A 2024 study from the International Conference on Wireless Communication and Sensor Networks proposes an integrated framework combining mobile edge computing, cloud computing, and vehicular ad-hoc networks. This framework uses non-cooperative game theory and knapsack-based scheduling to optimize task offloading, achieving reduced system overhead and improved service quality.

AI-Driven Offloading Strategies

Artificial Intelligence, particularly Deep Reinforcement Learning (DRL), is transforming how vehicles decide what to process locally versus remotely. In Mobile Edge Computing (MEC) environments, DRL algorithms dynamically learn optimal offloading policies based on latency, energy consumption, and network conditions. These strategies are especially potent in Open Radio Access Networks (ORAN), where intelligent xApps manage network slicing and resource allocation in real time.

️Optimization Algorithms for Real-Time Decisions

To address the complexity of real-world driving scenarios, researchers have developed multivariate particle swarm optimization (MPSO) algorithms tailored for cloud-edge aggregated computing. These algorithms abstract latency-impacting factors into quantifiable attributes and prioritize tasks for offloading. Experiments using simulation platforms like CETO-Sim show that MPSO outperforms traditional methods in both stability and latency reduction, making it a viable solution for high-concurrency environments such as urban traffic.

Commercial Implications and Future Directions

Public cloud providers—such as Azure, AWS, and Google Cloud—are increasingly offering edge-compatible services (e.g., Azure IoT Edge, AWS Wavelength) that support real-time analytics, federated learning, and secure data exchange. These platforms enable automotive OEMs and fleet operators to:

• Offload compute-intensive tasks like video analytics, object detection, and route optimization.

• Aggregate and analyze driving data across geographies for model refinement.

• Enable over-the-air updates and collaborative learning across vehicle fleets.

As 5G and 6G networks mature, the latency barrier between edge and cloud will continue to shrink, unlocking new possibilities for cooperative perception, swarm intelligence, and cloud-assisted manual driving.


Thursday, October 16, 2025

 Evidence in support of selective sampling of aerial drone imagery

One of the tenets1 of the platform2 we propose to analyze drone imagery favors selective sampling as opposed to repeated processing of every aerial drone image frame captured by the drone or UAV swarm. With the ability to decouple on-board computing and sensor capabilities from cloud analytics and yet providing feedback into the control loop for the drone/swarm, this selective sampling must be shown to have theoretical underpinnings. Related work in this field, indeed has demonstrated that. We cite only a few.

1. In “Learning Cost-Effective Sampling Strategies for Empirical Performance Modeling,” three large-scale experimental case studies demonstrate that cost-effective selective sampling can reduce average modeling costs by up to 85%, while retaining about 92% of model accuracy.

For two-parameter models: Using less than half the measurements (11/25), they achieved 82% of models within ±5% accuracy; using all measurements gave 93% within ±5%. Sampling saves up to 87% cost.

For more complex (four-parameter) models: Using just 17 points instead of 625, they achieved ~60% accuracy within ±20% error at less than 3% cost. More samples improved accuracy, but the return diminished as cost increased.

2. The experiment “Repeated Random Sampling for Minimizing the Time-to-Accuracy” found that a method called RS2 (Repeated Sampling of Random Subsets) reached 66% test accuracy on ImageNet with only 10% of the data per epoch, compared to 69% for full dataset—thus reducing compute cost and training time by more than fourfold, with only a 3% accuracy loss in the reported benchmark. Competing data pruning/design methods suffered notably greater accuracy drops at similar cost reductions.

3. The paper “Deep Learning with Importance Sampling” shows, with experiments on CIFAR10 and CIFAR100, that focused sampling can lower computational losses by an order of magnitude, while only decreasing accuracy between 5% and 17% relative to uniform sampling. Importance sampling helps maintain high accuracy versus naive selective/removal methods, especially for deep learning

4. The thesis “Autonomous UAV Swarms: Distributed Microservices, Heterogeneous Swarms, and Zoom Maneuvers” shows that selective sampling reduced energy consumption by 50% and increased the accuracy of field metrics extrapolation even when only 40% of image data was processed, showing that cloud microservices can efficiently handle limited, targeted workloads. This method dramatically lessens computational demand and cloud service costs, as unneeded data (e.g., images similar to prior frames or background clutter) can be filtered out before cloud upload and analysis

5. The paper “Drone swarm strategy for the detection and tracking of occluded targets” finds that selective analysis of images (rather than contiguous sequential frame ingestion) can be performed in centralized cloud systems by selectively offloading images and telemetry and taking advantage of bandwidth and compute savings.

6. The paper “Network optimization by regional computing for UAVs' big data" showed that for a UAV Swarm processing 200-2000 drones’ data, cloud computing costs ranged from $0.52 to $5.36 per task batch with intermediary regional processing included and negligible on-board computing costs but the latter is not scalable. Onboard processing time increased from 7.34 ms up to 73.4 ms as load grew, limiting big data utility. Cloud processing time stayed consistently low (0.05–0.07 ms) but at the expense of higher network delay and cost.

These studies indicate the following cost-vs-accuracy trade-offs.

Method Dataset & Parameters Cost Reduction Accuracy Retained

Sparse sampling Synthetic, 2 params 87% 82% (±5%)

Sparse sampling Synthetic, 4 params >95% 60% (±20%)

RS2 random sampling ImageNet, ResNet-18 90% 96% of baseline

Importance sampling CIFAR10, CIFAR100 83–90% 83–95%

For any scientific experiments including analysis of contiguous aerial drone imagery samples, the typical Accuracy vs. Cost Trends are such that with

• Moderate sampling (10–40% of data): there is small accuracy drop (1–4%) but large compute savings.

• Aggressive sampling (<10% of data): Accuracy may drop 10–40%, but cost plummets, useful for rapid prototyping.

• Sophisticated sampling (importance/randomized methods): Delivers best accuracy/cost tradeoff, especially for high-dimensional models.

The paper DeepBrain: Experimental Evaluation of Cloud-Based Computation offloading and Edge Computing in the Internet-of-Drones (PMC, 2020) has studied the change in accuracy when offloading deep-learning based detection to the cloud and also evaluates trade-offs between energy consumption, bandwidth uses, latency and throughput. The findings include much higher throughput (frames/sec) in cloud versus onboard computing even when communication delays and bandwidth bottlenecks increased in their experiments with the number of drones streaming video to the cloud and image compression or resolution reduction was introduced.

And building a knowledge graph or catalog of drone world objects eases detection and tracking of objects across frames. As the paper “Cataloging public objects using aerial and street-level images” has shown, a CNN based model can accurately detect and classify trees and combined with geolocation data builds a dataset that can be used for querying. This approach supports comprehensive analytics by organizing detected objects spatially and semantically. The use of knowledge graph over the catalog of detected objects takes this a step further to enable better semantic understanding and global context that conventional image-only models cannot provide which improves small-object detection and reduces false positives.


Wednesday, October 15, 2025

 Deep Learning in Drone Video Sensing

Deep learning (DL) has emerged as a transformative technology in remote sensing, revolutionizing the way remote-sensing images are analyzed and interpreted. This document provides a comprehensive review and meta-analysis of DL applications in remote sensing, covering various subfields, challenges, and future directions.

First, we review the evolution of remote-sensing image analysis, from traditional methods like support vector machines (SVM) and random forests (RF) to the resurgence of neural networks with the advent of DL. Since 2014, DL has gained prominence due to its superior performance in tasks such as land use and land cover (LULC) classification, scene classification, and object detection.

Next, the foundational DL models used in remote sensing deserves mention. Convolutional neural networks (CNNs) are the most widely used due to their ability to process multiband remote-sensing data. Recurrent neural networks (RNNs) are employed for sequential data analysis, while autoencoders (AEs) and deep belief networks (DBNs) are used for feature extraction and dimensionality reduction. Generative adversarial networks (GANs) have also gained traction for their ability to generate realistic data, making them useful for tasks like data augmentation.

Let's now review the state of DL in remote sensing with attributes such as study targets, DL models used, and accuracy levels. The usual focus is on LULC classification, object detection, and scene classification, with CNNs being the most frequently used model. High-resolution images (<10m), and urban areas were the most commonly analyzed. The median accuracy for scene classification was the highest (~95%), followed by object detection (~92%) and LULC classification (~91%).

Next, we delve into specific applications of DL in remote sensing. Image fusion, a fundamental task, benefits from DL's ability to characterize complex relationships between input and target images. Techniques like pan-sharpening and hyperspectral-multispectral fusion have been enhanced using CNNs and AEs. Image registration, essential for aligning images from different sensors or times, has seen advancements through Siamese networks and GANs. Scene classification and object detection, though similar, are distinguished by their focus on categorizing entire images versus identifying specific objects within images. DL has been instrumental in improving accuracy in both areas, though challenges like limited training data and object rotation variations persist.

LULC classification, a critical application, has seen significant improvements with DL. CNNs dominate this field, but other models like DBNs and GANs have also been explored. Challenges include the high cost of acquiring labeled training data and the need for methods that can handle medium- and low-resolution images. Semantic segmentation, which assigns labels to each pixel in an image, has benefited from fully convolutional networks (FCNs) but still faces challenges in balancing global context and local detail.

Object-based image analysis (OBIA) integrates DL with segmentation techniques to classify objects in remote-sensing images. While effective, the choice of parameters like patch size significantly impacts accuracy. Other emerging applications include time-series analysis, where RNNs are used to analyze sequential data, and the use of DL for tasks like accuracy assessment and data prediction.

Finally, we bring up the need for benchmark datasets, especially for medium- and low-resolution images, to standardize comparisons between DL algorithms. While DL has shown superior performance in many areas, challenges like training data limitations, network optimization, and real-world applicability remain. Future research should focus on addressing these challenges and expanding DL applications to underexplored areas like time-series analysis and image preprocessing.

Deep Learning has demonstrated immense potential in remote sensing, offering innovative solutions to longstanding challenges. This must be followed by the development of benchmark datasets, and the exploration of novel applications.


Tuesday, October 14, 2025

 Deep Learning in Drone Video Sensing

Deep learning (DL) has emerged as a transformative technology in remote sensing, revolutionizing the way remote-sensing images are analyzed and interpreted. This document provides a comprehensive review and meta-analysis of DL applications in remote sensing, covering various subfields, challenges, and future directions.

First, we review the evolution of remote-sensing image analysis, from traditional methods like support vector machines (SVM) and random forests (RF) to the resurgence of neural networks with the advent of DL. Since 2014, DL has gained prominence due to its superior performance in tasks such as land use and land cover (LULC) classification, scene classification, and object detection.

Next, the foundational DL models used in remote sensing deserves mention. Convolutional neural networks (CNNs) are the most widely used due to their ability to process multiband remote-sensing data. Recurrent neural networks (RNNs) are employed for sequential data analysis, while autoencoders (AEs) and deep belief networks (DBNs) are used for feature extraction and dimensionality reduction. Generative adversarial networks (GANs) have also gained traction for their ability to generate realistic data, making them useful for tasks like data augmentation.

Let's now review the state of DL in remote sensing with attributes such as study targets, DL models used, and accuracy levels. The usual focus is on LULC classification, object detection, and scene classification, with CNNs being the most frequently used model. High-resolution images (<10m), and urban areas were the most commonly analyzed. The median accuracy for scene classification was the highest (~95%), followed by object detection (~92%) and LULC classification (~91%).

Next, we delve into specific applications of DL in remote sensing. Image fusion, a fundamental task, benefits from DL's ability to characterize complex relationships between input and target images. Techniques like pan-sharpening and hyperspectral-multispectral fusion have been enhanced using CNNs and AEs. Image registration, essential for aligning images from different sensors or times, has seen advancements through Siamese networks and GANs. Scene classification and object detection, though similar, are distinguished by their focus on categorizing entire images versus identifying specific objects within images. DL has been instrumental in improving accuracy in both areas, though challenges like limited training data and object rotation variations persist.

LULC classification, a critical application, has seen significant improvements with DL. CNNs dominate this field, but other models like DBNs and GANs have also been explored. Challenges include the high cost of acquiring labeled training data and the need for methods that can handle medium- and low-resolution images. Semantic segmentation, which assigns labels to each pixel in an image, has benefited from fully convolutional networks (FCNs) but still faces challenges in balancing global context and local detail.

Object-based image analysis (OBIA) integrates DL with segmentation techniques to classify objects in remote-sensing images. While effective, the choice of parameters like patch size significantly impacts accuracy. Other emerging applications include time-series analysis, where RNNs are used to analyze sequential data, and the use of DL for tasks like accuracy assessment and data prediction.

Finally, we bring up the need for benchmark datasets, especially for medium- and low-resolution images, to standardize comparisons between DL algorithms. While DL has shown superior performance in many areas, challenges like training data limitations, network optimization, and real-world applicability remain. Future research should focus on addressing these challenges and expanding DL applications to underexplored areas like time-series analysis and image preprocessing.

Deep Learning has demonstrated immense potential in remote sensing, offering innovative solutions to longstanding challenges. This must be followed by the development of benchmark datasets, and the exploration of novel applications.


Monday, October 13, 2025

 While the previous article1 explained two techniques for scene and object correlation in aerial drone image analysis, we must take a step back to put things in perspective. Object vectors represent localized features—like buildings, vehicles, or trees—extracted from specific regions of aerial images. Scene vectors capture the broader context—land use, terrain type, weather conditions, or urban layout—across the entire image or large segments. The challenge is to correlate these two levels of representation so that object detection and classification are more accurate, robust, and context aware.

Among the several techniques to do so, we list some of the salient ones from research survey:

1. Transformer-Based Attention Fusion: Region of Interest (RoI) proposals (object vectors) are treated as tokens and passed through a Transformer encoder alongside scene-level tokens derived from CLIP or other pretrained models. Attention weights are modulated based on spatial and geometric relationships, allowing the model to learn how objects relate to their surroundings (e.g., ships shouldn’t appear on runways). In the DOTA benchmark, this method reduced false positives by modeling inter-object and object-background dependencies

2. Confounder-Free Fusion Networks (CFF-NET): Three branches extract global scene features, local object features, and confounder-free object-level attention. These are fused to eliminate spurious correlations (e.g., associating cars with rooftops due to dataset bias). It disentangles true object-scene relationships from misleading ones caused by long-tailed distributions or biased training data. CFF-NET improved aerial image captioning and retrieval by aligning object vectors with meaningful scene context

3. Contrastive Learning with CLIP Tokens: Object and scene vectors are encoded using CLIP, and contrastive loss is applied to ensure that semantically similar regions (e.g., industrial zones) have aligned embeddings. This enforces consistency across different image scales and lighting conditions, especially useful in cloud-based pipelines where data is heterogeneous. Generalization is improved across datasets like DIOR-R and DOTA-v2.0

4. Gated Recurrent Units for Regional Weighting: GRUs scan image regions and assign weights to object vectors based on their contextual importance within the scene. This helps prioritize objects that are contextually relevant (e.g., emergency vehicles in disaster zones) while suppressing noise. Used in CFF-NET to refine local feature extraction and improve classification accuracy

5. Cloud-Based Vector Aggregation: Object and scene vectors are streamed to cloud platforms (e.g., Azure, GEE) where they’re aggregated, indexed, and queried using vector search or clustering. Enables scalable, real-time analytics across massive aerial datasets—ideal for smart city monitoring or disaster response. GitHub repositories like satellite-image-deep-learning2 offer pipelines for embedding and retrieval

Summary:

Method Object-Scene Correlation Strategy Benefit

Transformer Attention Fusion Spatial/geometric-aware attention weights Reduces false positives

CFF-NET Confounder-free multi-branch fusion Improves discriminative power

CLIP Contrastive Learning Semantic alignment across scales Enhances generalization

GRU Regional Weighting Contextual importance scoring Prioritizes relevant objects

Cloud Vector Aggregation Scalable indexing and retrieval Enables real-time analytics


#Codingexercise: CodingExercise-10-13-2025.docx 


Sunday, October 12, 2025

 This is a continuation of a previous article1 on BBAVectors and Transformer-based context aware detection:

1. Sample for BBAVectors:

import os

import torch

from PIL import Image

from torchvision import transforms

from models.detector import build_detector # from BBAVectors repo

from utils.visualize import visualize_detections # optional visualization

from utils.inference import run_inference # custom helper you may need to define

# Load pretrained BBAVectors model

def load_bbavectors_model(config_path, checkpoint_path):

    model = build_detector(config_path)

    model.load_state_dict(torch.load(checkpoint_path, map_location='cpu'))

    model.eval()

    return model

# Preprocess image from URI

def load_image_from_uri(uri):

    image = Image.open(uri).convert("RGB")

    transform = transforms.Compose([

        transforms.Resize((1024, 1024)),

        transforms.ToTensor(),

    ])

    return transform(image).unsqueeze(0) # Add batch dimension

# Run detection

def detect_landmarks(model, image_tensor):

    with torch.no_grad():

        outputs = model(image_tensor)

    return outputs # BBAVectors returns oriented bounding boxes

# Main workflow

def main():

    # Paths to config and weights

    config_path = 'configs/dota_bbavectors.yaml'

    checkpoint_path = 'checkpoints/bbavectors_dota.pth'

    # URIs to drone images

    image_uris = [

        'drone_images/scene1.jpg',

        'drone_images/scene2.jpg'

    ]

    model = load_bbavectors_model(config_path, checkpoint_path)

    for uri in image_uris:

        image_tensor = load_image_from_uri(uri)

        detections = detect_landmarks(model, image_tensor)

        print(f"\nDetections for {uri}:")

        for det in detections:

            print(f"Class: {det['label']}, Score: {det['score']:.2f}, BBox: {det['bbox']}")

        # Optional: visualize results

        # visualize_detections(uri, detections)

if __name__ == "__main__":

    main()

2. Sample for semantic based detection:

from PIL import Image

import requests

import torch

from transformers import DetrImageProcessor, DetrForObjectDetection

# Load pretrained DETR model and processor

processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50")

model = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50")

# Function to load image from URI

def load_image(uri):

    return Image.open(requests.get(uri, stream=True).raw).convert("RGB")

# Function to detect objects and return labels

def detect_objects(image):

    inputs = processor(images=image, return_tensors="pt")

    outputs = model(**inputs)

    # Filter predictions by confidence threshold

    target_sizes = torch.tensor([image.size[::-1]])

    results = processor.post_process_object_detection(outputs, target_sizes=target_sizes, threshold=0.9)[0]

    labels = [model.config.id2label[label.item()] for label in results["labels"]]

    return set(labels)

# URIs for two drone-captured scenes

scene1_uri = "https://example.com/drone_scene_1.jpg"

scene2_uri = "https://example.com/drone_scene_2.jpg"

# Load and process both scenes

scene1 = load_image(scene1_uri)

scene2 = load_image(scene2_uri)

labels1 = detect_objects(scene1)

labels2 = detect_objects(scene2)

# Compare object presence

shared_objects = labels1.intersection(labels2)

unique_to_scene1 = labels1 - labels2

unique_to_scene2 = labels2 - labels1

# Print results

print("Shared objects between scenes:", shared_objects)

print("Unique to Scene 1:", unique_to_scene1)

print("Unique to Scene 2:", unique_to_scene2)


Saturday, October 11, 2025

 Scene/Object correlation in Aerial Drone Image Analysis

Given aerial drone images and their vector representations for scene and objects, correlation at scene-level and object-level is evolving research in drone sensing applications. The ability to predict object presence in unseen urban scenes using vector representations makes several drone sensing use cases easy to implement on the analytics side without requiring custom models1. Two promising approaches—Box Boundary-Aware Vectors (BBAVectors) and Context-Aware Detection via Transformer and CLIP tokens—offer distinct yet complementary pathways toward this goal. Both methods seek to bridge the semantic gap between scene-level embeddings and object-level features, enabling predictive inference across spatial domains. These are described in the following sections.

Box Boundary-Aware Vectors: Geometry as a Signature

BBAVectors reimagine object detection by encoding geometric relationships rather than relying solely on bounding box regression. Traditional object detectors predict the coordinates of bounding boxes directly, which can be brittle in aerial imagery where objects are rotated, occluded, or densely packed. BBAVectors instead regress directional vectors—top, right, bottom, and left—from the object center to its boundaries. This vectorized representation captures the shape, orientation, and spatial extent of objects in a way that is more robust to rotation and scale variance.

In the context of scene-object correlation, BBAVectors serve as a geometric signature. For example, consider a building with a circular roof in an aerial image2. Its BBAVector profile—equal-length vectors radiating symmetrically from the center—would differ markedly from that of a rectangular warehouse or a triangular-roofed church. When applied to a new scene, the presence of similar BBAVector patterns can suggest the existence of a circular-roofed structure, even if the building is partially occluded or viewed from a different angle.

This approach has been validated in datasets like DOTA (Dataset for Object Detection in Aerial Images)3, where BBAVector-based models outperform traditional detectors in identifying rotated and irregularly shaped objects. By embedding these vectors into a shared latent space, one can correlate object-level geometry with scene-level context, enabling predictive modeling across scenes.

Context-Aware Detection via Transformer and CLIP Tokens: Semantics and Attention

While BBAVectors excel at capturing geometry, context-aware detection leverages semantic relationships. This method treats object proposals and image segments as tokens in a Transformer architecture, allowing the model to learn inter-object and object-background dependencies through attention mechanisms. By integrating CLIP (Contrastive Language–Image Pretraining) features, the model embeds both visual and textual semantics into a unified space.

CLIP tokens encode high-level concepts—such as “circular building,” “parking lot,” or “green space”—based on large-scale image-text training. When combined with Transformer attention, the model can infer the likelihood of object presence based on surrounding context. For instance, if a circular-roofed building is typically adjacent to a park and a road intersection, the model can learn this spatial-semantic pattern. In a new scene with similar context vectors, it can predict the probable presence of the landmark even if it’s not directly visible.

This approach has been explored in works like “DETR” (DEtection TRansformer)4 and “GLIP” (Grounded Language-Image Pretraining)5, which demonstrate how attention-based models can generalize object detection across domains. In aerial imagery, this means that scene-level embeddings—augmented with CLIP tokens—can serve as priors for object-level inference.

Bridging the Two: Predictive Correlation Across Scenes

Together, BBAVectors and context-aware detection offer a dual lens: one geometric, the other semantic. By embedding both object-level vectors and scene-level features into a shared space—whether through contrastive learning, metric learning, or attention-weighted fusion—researchers can build models that predict object presence in new scenes with remarkable accuracy.

Imagine a workflow where a drone captures a new urban scene. The scene is encoded using CLIP-based features and Transformer attention maps. Simultaneously, known object signatures from previous scenes—represented as BBAVectors—are matched against the new scene’s embeddings. If the context and geometry align, the model flags the likely presence of a circular-roofed building, even before it’s explicitly detected.

This paradigm has implications for smart city planning, disaster response, and autonomous navigation. By correlating scene and object vectors, systems can anticipate infrastructure layouts, identify critical assets, and adapt to dynamic environments—all from the air.

#Codingexercise: CodingExercise-10-11-2025.docx