Previous article explained the organization of the software architecture with a shift towards deep learning, cataloguing and agentic-retrieval-based analytics of selected image set rather than image processing of each and every extracted frame from an aerial drone video. The model used for this object interpretation from scenes influences the depth and breadth of the drone world catalog. In this example below, we will use a yolov5m model pretrained on DOTA (aerial dataset) to enhance the detections for our purpose.
The model can be downloaded as:
and the python code to detect the objects in a scene can be as follows:
import subprocess
def run_yolov5_obb_inference(image_path, weights_path="weights/yolov5m.pt", output_dir="runs/detect"):
cmd = [
"python", "detect.py",
"--weights", weights_path,
"--source", image_path,
"--img", "1024",
"--conf", "0.25",
"--iou", "0.4",
"--save-txt",
"--save-conf",
"--name", "drone_inference"
]
subprocess.run(cmd)
# Example usage
run_yolov5_obb_inference("inference/images/urban_drone_image.jpg")
or using ultralytics package as shown below on COCO-pretrained model
from ultralytics import YOLO
# from yolo import YOLODetector
from ultralytics import YOLO
# Load a COCO-pretrained YOLOv8n model
model = YOLO("yolov8n.pt")
# Display model information (optional)
model.info()
results = model("parking2.jpg")
print(results)
Result:
YOLOv8n summary: 129 layers, 3,157,200 parameters, 0 gradients, 8.9 GFLOPs
image 1/1 C:\Users\ravib\vision\ezvision\analytics\track\parking2.jpg: 384x640 1 truck, 1 snowboard, 175.9ms
Speed: 21.1ms preprocess, 175.9ms inference, 11.1ms postprocess per image at shape (1, 3, 384, 640)
[ultralytics.engine.results.Results object with attributes:
boxes: ultralytics.engine.results.Boxes object
keypoints: None
masks: None
names: {0: 'person', 1: 'bicycle', 2: 'car', 3: 'motorcycle', 4: 'airplane', 5: 'bus', 6: 'train', 7: 'truck', 8: 'boat', 9: 'traffic light', 10: 'fire hydrant', 11: 'stop sign', 12: 'parking meter', 13: 'bench', 14: 'bird', 15: 'cat', 16: 'dog', 17: 'horse', 18: 'sheep', 19: 'cow', 20: 'elephant', 21: 'bear', 22: 'zebra', 23: 'giraffe', 24: 'backpack', 25: 'umbrella', 26: 'handbag', 27: 'tie', 28: 'suitcase', 29: 'frisbee', 30: 'skis', 31: 'snowboard', 32: 'sports ball', 33: 'kite', 34: 'baseball bat', 35: 'baseball glove', 36: 'skateboard', 37: 'surfboard', 38: 'tennis racket', 39: 'bottle', 40: 'wine glass', 41: 'cup', 42: 'fork', 43: 'knife', 44: 'spoon', 45: 'bowl', 46: 'banana', 47: 'apple', 48: 'sandwich', 49: 'orange', 50: 'broccoli', 51: 'carrot', 52: 'hot dog', 53: 'pizza', 54: 'donut', 55: 'cake', 56: 'chair', 57: 'couch', 58: 'potted plant', 59: 'bed', 60: 'dining table', 61: 'toilet', 62: 'tv', 63: 'laptop', 64: 'mouse', 65: 'remote', 66: 'keyboard', 67: 'cell phone', 68: 'microwave', 69: 'oven', 70: 'toaster', 71: 'sink', 72: 'refrigerator', 73: 'book', 74: 'clock', 75: 'vase', 76: 'scissors', 77: 'teddy bear', 78: 'hair drier', 79: 'toothbrush'}
obb: None
orig_img: array([[[ 81, 79, 85],
[ 54, 52, 58],
[ 46, 44, 50],
...,
[ 18, 32, 30],
[ 26, 39, 37],
[ 29, 42, 40]],
[[124, 122, 128],
[ 97, 95, 101],
[ 67, 65, 71],
...,
[ 20, 34, 32],
[ 29, 42, 40],
[ 33, 46, 44]],
[[164, 162, 168],
[159, 157, 163],
[131, 129, 135],
...,
[ 28, 42, 40],
[ 36, 50, 48],
[ 41, 55, 53]],
...,
[[ 16, 10, 3],
[ 16, 10, 3],
[ 16, 10, 3],
...,
[103, 84, 69],
[103, 84, 69],
[103, 84, 69]],
[[ 16, 10, 5],
[ 16, 10, 5],
[ 16, 10, 5],
...,
[103, 84, 69],
[103, 84, 71],
[103, 84, 71]],
[[ 16, 10, 5],
[ 16, 10, 5],
[ 16, 10, 5],
...,
[103, 84, 69],
[103, 84, 71],
[103, 84, 71]]], shape=(720, 1280, 3), dtype=uint8)
orig_shape: (720, 1280)
path: 'C:\\Users\\ravib\\vision\\ezvision\\analytics\\track\\parking2.jpg'
probs: None
save_dir: 'runs\\detect\\predict'
speed: {'preprocess': 21.11890004016459, 'inference': 175.85100000724196, 'postprocess': 11.089799925684929}]
on the following input:
No comments:
Post a Comment