The following is distance measurement studies from DVSA benchmarks:
1. Determine scale of each frame.
2. Identification of the largest built-up structure encountered during the drone tour
3. Estimating the size of that largest built-up structure
4. Identification of the largest free space encountered during the drone tour
5. Estimating the size of that largest free space.
6. Identifying the count of similar sized structures within a scene.
7. Identifying the count of objects that can occupy a given free space
8. Identifying the distance between two points of interest across disparate frames, such as the length traversed by the drone in a specific direction prior to a turning point.
9. Total distance covered prior to revisits to a point of interest.
Methodology:
1. Scale (eg. 1:100 1 unit in the image = 100 units in real life). This needs to be found out only once.
a. Each frame has a location and timestamp before it is vectorized and stored in the vector store along with its insights on objects and bounding boxes. Therefore, scale resolution can be achieved in a few ways:
i. Using Ground Sample Distance as in 2cm/pixel as a fraction of real distance versus the tiniest point in an image with smaller GSD being better for details.
1. With GSD either known earlier or already computed as (Flight Altitude x Sensor Dimension) / (Focal length x Image Dimension), return scale as inversion of GSD
ii. Using well-known objects or landmarks:
1. Given the bounding box of a well-known object in the frame, say an intermediate sedan or a known landmark, compute the scale as representative fraction comprising of pixel-length by actual length on ground such as that of a semi-trailer.
2. Width of road: Given the width of the road in pixels and the ground distance from a city record or google maps, we can determine the scale.
iii. Using GPS co-ordinates:
1. Using overall tour:
a. get the overall tour bounding box width and height in terms of latitude and longitude by computing (min Latitude, min Logitude, max Latitude, max Longitude)
b. Calculate the fraction of the tour area covered by the current frame:
c. Proportionately distribute the height to width given the frame width and height or take the square root of the (fw x fh) / (tw x th)
d. Emit the scale
2. Using GPS co-ordinates of two points in the same frame:
a. Take two points in the frame such as one pertaining to the center of the frame given by the drone and another found from Google Maps and compute the actual distance using Haversine Formula.
height_m = haversine(lat_min, lon_min, lat_max, lon_min)
width_m = haversine(lat_min, lon_min, lat_min, lon_max)
Note: Since every frame has a GPS co-ordinate to begin with, to find another gps coordinate in the same frame, detect, clip and vectorize an object in that frame and find it in Google Maps of the scene at the Latitude and Longitude and get its GPS co-ordinates. Haversine can then be used to the actual distance while the pixel width gives the image-based distance.
b. Emit the scale
For example:
from math import radians, cos, sin, asin, sqrt
# Step 1: Haversine function to compute distances in meters
def haversine(lat1, lon1, lat2, lon2):
R = 6371000 # Earth's radius in meters
dlat = radians(lat2 - lat1)
dlon = radians(lon2 - lon1)
a = sin(dlat/2)**2 + cos(radians(lat1))*cos(radians(lat2))*sin(dlon/2)**2
c = 2 * asin(sqrt(a))
return R * c
# Bounding rectangle corners (nearest and farthest)
lat_min, lon_min = 42.37043, -71.12165
lat_max, lon_max = 42.37125, -71.11733
# Compute east-west (width) and north-south (height) ground distances, in meters
height_m = haversine(lat_min, lon_min, lat_max, lon_min)
width_m = haversine(lat_min, lon_min, lat_min, lon_max)
# Step 2: Area in square meters
area_m2 = width_m * height_m
# Step 3: Convert to square feet (1 m = 3.28084 ft)
area_ft2 = area_m2 * (3.28084 ** 2)
# Step 4: Convert to square miles (1 sq mile = 27,878,400 sq ft)
area_miles2 = area_ft2 / 27878400
print(f"Ground area covered: {area_miles2:.6f} square miles")
2. Largest built-up find:
a. The bounding boxes of all detected objects in a scene gives the area of each
b. sort and filter these to include only the buildings
c. Return the top most from descending order
3. Largest built-up area:
a. Using 2. Find the bounding box of the corresponding object in the scene and calculate width and height
b. With the scale computed from 1. And the width and height from previous step, calculate the area as width x scale x height x scale
4. Largest free-space find:
a. If the detected objects are tagged as one of park, street intersection, courtyard, parking lot, transit center, grass, pavement, lake, river etc, pick the largest one as shown from 2. Above
b. Use color histogram based analysis to classify land cover
5. Largest free-space area:
a. If the free space is in one of the detected objects, then its bounding box and scale gives the largest free space area
b. Otherwise get the color histogram and proportionately divide the area of the scene for the chosen color
6. Count of objects in a scene can be done with trained models or clustering and hdbscan
7. Given the object size is found by bounding box and scale and the free space is given by its bounding box and scale, this is just a simple multiple
8. Distance calculation based on disparate frames is easy to do with GPS co-ordinates for each which is a given and a Haversine computation. The trick is to find the nearest and the furthest frames from the scene catalog and either a ground truth can be relied upon such as Google Maps or Geodnet or preferably turning point frames can be identified from the video and such frames can be correlated with timestamps and velocity of the drone to find displacement in that direction.
9. Cumulation of the above in all directions traversed by the drone provides the total distance covered or as speed of drone x (flight time – hover time).
Operators for logic above become re-usable and must be curated into a library of the DVSA application or framework. Improvements to object detection and counting in a scene can be accomplished by better training and fine-tuning the corresponding model