Tuesday, June 10, 2025

 The use of UAV swarm is better applied to surveying, remote sensing, disaster preparedness and responses such as wildfires, and those that make use of LiDAR data. Power line and windmill monitoring companies are especially suited for making use of a fleet of drones. Besides, there are over ten LiDAR companies that are public in US and many more across Europe and Asia that make use of a fleet of drones, photogrammetry and LiDAR data. Those that are using simultaneous localization and mapping (SLAM), structure-from-motion (SfM), and semantic segmentation with CNNs are possibly building their own knowledge bases, so it would not hurt to show them one that is built in the cloud in incremental, observable and near real-time. With GPS and satellite imagery, most terrains are navigable but vision-based navigation enables autonomous navigation and one day hopefully at all heights.

Sample SLAM to compare images:

import cv2

import numpy as np

from vslam_py import FeatureMatcher

# Load the aerial images

image1 = cv2.imread("hoover_tower_forward.jpg", cv2.IMREAD_GRAYSCALE)

image2 = cv2.imread("hoover_tower_reverse.jpg", cv2.IMREAD_GRAYSCALE)

# Initialize ORB feature detector

orb = cv2.ORB_create()

# Detect keypoints and descriptors

keypoints1, descriptors1 = orb.detectAndCompute(image1, None)

keypoints2, descriptors2 = orb.detectAndCompute(image2, None)

# Use the vSLAM-py matcher (or OpenCV's BFMatcher)

matcher = FeatureMatcher()

matches = matcher.match(descriptors1, descriptors2)

# Draw matches

output_image = cv2.drawMatches(image1, keypoints1, image2, keypoints2, matches[:50], None, flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS)

# Display the results

cv2.imshow("Matched Features", output_image)

cv2.waitKey(0)

cv2.destroyAllWindows()

Emerging trends:

Constructing an incremental “knowledge base” of a landscape from drone imagery merges ideas from simultaneous localization and mapping (SLAM), structure-from-motion (SfM), and semantic segmentation. Incremental SLAM and 3D reconstruction is suggested in the ORB-SLAM2 paper by Mur-Atal and Tardos in 2017 where a 3D Map is built by estimating camera poses and reconstructing scene geometry from monocular, stereo, or RGB-D inputs. Such SLAM framework can also be extended by fusing in semantic cues to enrich the resulting map with object and scene labels The idea of including semantic information for 3D reconstruction is demonstrated by SemanticFusion written by McCormick et al for ICCV 2017 where they use a Convolutional Neural Network aka CNN for semantic segmentation as their system fuses semantic labels into a Surfel-based 3D map, thereby transforming a purely geometric reconstruction into a semantically rich representation of a scene. SemanticFusion helps to label parts of the scene – turning a raw point cloud or mesh into a knowledge base where objects, surfaces and even relationships can be recognized and queries. SfM, on the other hand, helps to stitch multi-view data into a consistent 3D-model where the techniques are particularly relevant for drone applications. Incremental SfM pipelines can populate information about a 3D space based on the data that arrives in the pipeline, but the drones can “walk the grid” around an area of interest to make sure sufficient data is captured to build the 3D-model from 0 to 100% and the progress can even be tracked. Semantic layer is not added to SfM processing, but semantic segmentation or object detection can be layered on independently overly the purely geometric data. Layering-on additional modules for say, object detection, region classification, or even reasoning over scene changes helps to start with basic geometric layouts and add optionally to build comprehensive knowledge base. Algorithms that crunch these sensor data whether they are images or LiDAR data must operate in real-time and not on batch periodic analysis. They can, however, be dedicated to specific domains such as urban monitoring, agricultural surveying, or environmental monitoring for additional context-specific knowledge.

Addendum:

• SIFT is best for high-accuracy applications like object recognition.

• ORB is ideal for real-time applications like SLAM (Simultaneous Localization and Mapping).

• SURF balances speed and accuracy, making it useful for tracking and image stitching.


No comments:

Post a Comment