Saturday, October 25, 2025

 Previous article explained the organization of the software architecture with a shift towards deep learning, cataloguing and agentic-retrieval-based analytics of selected image set rather than image processing of each and every extracted frame from an aerial drone video. The model used for this object interpretation from scenes influences the depth and breadth of the drone world catalog. In this example below, we will use a yolov5m model pretrained on DOTA (aerial dataset) to enhance the detections for our purpose. 

The model can be downloaded as: 

and the python code to detect the objects in a scene can be as follows: 

import subprocess 

 

def run_yolov5_obb_inference(image_path, weights_path="weights/yolov5m.pt", output_dir="runs/detect"): 

    cmd = [ 

        "python", "detect.py", 

        "--weights", weights_path, 

        "--source", image_path, 

        "--img", "1024", 

 

        "--conf", "0.25", 

        "--iou", "0.4", 

        "--save-txt", 

        "--save-conf", 

        "--name", "drone_inference" 

    ] 

    subprocess.run(cmd) 

 

# Example usage 

run_yolov5_obb_inference("inference/images/urban_drone_image.jpg") 

 

or using ultralytics package as shown below on COCO-pretrained model 

from ultralytics import YOLO 
# from yolo import YOLODetector 
 
from ultralytics import YOLO 
 
# Load a COCO-pretrained YOLOv8n model 
model = YOLO("yolov8n.pt") 
 
# Display model information (optional) 
model.info() 

results = model("parking2.jpg") 
print(results) 

Result: 

YOLOv8n summary: 129 layers, 3,157,200 parameters, 0 gradients, 8.9 GFLOPs 
 
image 1/1 C:\Users\ravib\vision\ezvision\analytics\track\parking2.jpg: 384x640 1 truck, 1 snowboard, 175.9ms 
Speed: 21.1ms preprocess, 175.9ms inference, 11.1ms postprocess per image at shape (1, 3, 384, 640) 
[ultralytics.engine.results.Results object with attributes: 
 
boxes: ultralytics.engine.results.Boxes object 
keypoints: None 
masks: None 
names: {0: 'person', 1: 'bicycle', 2: 'car', 3: 'motorcycle', 4: 'airplane', 5: 'bus', 6: 'train', 7: 'truck', 8: 'boat', 9: 'traffic light', 10: 'fire hydrant', 11: 'stop sign', 12: 'parking meter', 13: 'bench', 14: 'bird', 15: 'cat', 16: 'dog', 17: 'horse', 18: 'sheep', 19: 'cow', 20: 'elephant', 21: 'bear', 22: 'zebra', 23: 'giraffe', 24: 'backpack', 25: 'umbrella', 26: 'handbag', 27: 'tie', 28: 'suitcase', 29: 'frisbee', 30: 'skis', 31: 'snowboard', 32: 'sports ball', 33: 'kite', 34: 'baseball bat', 35: 'baseball glove', 36: 'skateboard', 37: 'surfboard', 38: 'tennis racket', 39: 'bottle', 40: 'wine glass', 41: 'cup', 42: 'fork', 43: 'knife', 44: 'spoon', 45: 'bowl', 46: 'banana', 47: 'apple', 48: 'sandwich', 49: 'orange', 50: 'broccoli', 51: 'carrot', 52: 'hot dog', 53: 'pizza', 54: 'donut', 55: 'cake', 56: 'chair', 57: 'couch', 58: 'potted plant', 59: 'bed', 60: 'dining table', 61: 'toilet', 62: 'tv', 63: 'laptop', 64: 'mouse', 65: 'remote', 66: 'keyboard', 67: 'cell phone', 68: 'microwave', 69: 'oven', 70: 'toaster', 71: 'sink', 72: 'refrigerator', 73: 'book', 74: 'clock', 75: 'vase', 76: 'scissors', 77: 'teddy bear', 78: 'hair drier', 79: 'toothbrush'} 
obb: None 
orig_img: array([[[ 81,  79,  85], 
        [ 54,  52,  58], 
        [ 46,  44,  50], 
        ..., 
        [ 18,  32,  30], 
        [ 26,  39,  37], 
        [ 29,  42,  40]], 
 
       [[124, 122, 128], 
        [ 97,  95, 101], 
        [ 67,  65,  71], 
        ..., 
        [ 20,  34,  32], 
        [ 29,  42,  40], 
        [ 33,  46,  44]], 
 
       [[164, 162, 168], 
        [159, 157, 163], 
        [131, 129, 135], 
        ..., 
        [ 28,  42,  40], 
        [ 36,  50,  48], 
        [ 41,  55,  53]], 
 
       ..., 
 
       [[ 16,  10,   3], 
        [ 16,  10,   3], 
        [ 16,  10,   3], 
        ..., 
        [103,  84,  69], 
        [103,  84,  69], 
        [103,  84,  69]], 
 
       [[ 16,  10,   5], 
        [ 16,  10,   5], 
        [ 16,  10,   5], 
        ..., 
        [103,  84,  69], 
        [103,  84,  71], 
        [103,  84,  71]], 
 
       [[ 16,  10,   5], 
        [ 16,  10,   5], 
        [ 16,  10,   5], 
        ..., 
        [103,  84,  69], 
        [103,  84,  71], 
        [103,  84,  71]]], shape=(720, 1280, 3), dtype=uint8) 
orig_shape: (720, 1280) 
path: 'C:\\Users\\ravib\\vision\\ezvision\\analytics\\track\\parking2.jpg' 
probs: None 
save_dir: 'runs\\detect\\predict' 
speed: {'preprocess': 21.11890004016459, 'inference': 175.85100000724196, 'postprocess': 11.089799925684929}] 

 

on the following input: 

Aerial view of a parking lot

AI-generated content may be incorrect. 

 

No comments:

Post a Comment