# 🚁 DVSA: The Industrial-Grade Drone Video Analytics Platform for AI/LLM Applications
**Transform aerial drone footage into actionable intelligence for your RAG, LLM-based agents, and ReAct frameworks.**
## Overview
**DVSA (Drone Video Sensing Analytics)** is a production-ready, open-source platform that eliminates the friction of building drone video analysis capabilities into AI-powered applications. Whether[...]
### Why DVSA?
- **Zero to Production in Hours**: Plug-and-play API and UI; no need to reinvent video processing, detection pipelines, or geospatial workflows.
- **Built for AI/LLM Integration**: Expose drone detections and analytics as structured data feeds to your RAG systems, LLM agents, and reasoning frameworks.
- **Enterprise Architecture**: Django REST, PostgreSQL, async workers, JWT auth, comprehensive logging—designed for scale and reliability.
- **Modular, Extensible Design**: Swap models (YOLO, Faster R-CNN, custom ONNX), add new analytics routines, or integrate with your own ML stacks without forking.
- **Optimized for Aerial Imagery**: High-resolution frame handling with intelligent tiling, model selection by altitude/resolution, and geospatial-aware analytics.
---
## 🎯 Who DVSA Is For
### **AI/ML Engineers & Researchers**
Building intelligent systems that need to *understand* drone footage:
- **Autonomous surveillance agents** that detect threats or anomalies in real-time.
- **RAG pipelines** that retrieve contextual drone footage in response to natural language queries.
- **LLM-based reasoning systems** (ReAct, CoT) that process video detections as observations to plan actions.
- **Multi-modal foundation models** that fuse drone imagery with text/geospatial data.
### **Drone Application Developers**
Integrating drone analytics into commercial or research platforms:
- Smart city monitoring (traffic, crowds, infrastructure).
- Agricultural analytics (crop health, field mapping).
- Search & rescue (personnel/asset detection).
- Environmental monitoring (wildlife, disaster assessment).
### **Enterprise & ISV Partners**
OEM platforms requiring embeddable video analytics:
- White-label integration via REST API.
- Custom model deployment (LandingLens, Azure Custom Vision, Ultralytics YOLO).
- Real-time stream processing and alerting.
---
## 🚀 Getting Started
### One-Minute Setup (Docker)
```bash
git clone https://github.com/ravibeta/dvsa-api.git
cd dvsa-api
docker-compose up
# API live at http://localhost:8000
# UI live at http://localhost:3000
```
### Integrate into Your AI Application
**Option 1: Call the REST API from your LLM agent**
```python
# Python agent example (Langchain/AutoGen)
import requests
DVSA_API = "http://localhost:8000/api"
def analyze_drone_footage(video_id: str, model: str = "yolov8") -> dict:
"""Run object detection on a drone video."""
resp = requests.post(
f"{DVSA_API}/analytics/videos/{video_id}/run",
json={"routines": [model], "frame_step": 30, "max_frames": 300}
)
resp.raise_for_status()
return resp.json() # Detections with bbox, labels, confidence scores
# Use in your ReAct / agent loop
def agent_action(video_id: str):
detections = analyze_drone_footage(video_id)
summary = f"Found {len(detections)} objects: {detections['summary']}"
return summary # Pass to LLM as observation
```
**Option 2: Embed DVSA as a Python library**
```python
from apps.analytics.routines import run_frame_routine
from apps.analytics.models import Video
import cv2
# Load a video from the database
video = Video.objects.get(id=video_id)
frame = cv2.imread(video.file_path)
# Run any registered detector synchronously
result = run_frame_routine("custom_onnx_detection", frame)
print(result) # {"label": "vehicle", "score": 0.92, "bbox": [x, y, w, h], ...}
```
**Option 3: Plug into your data pipeline**
```python
# Async Celery task for batch processing
from dvsa_api.analytics.tasks import run_video_analysis
# Queue analysis for 1000 videos
for video_id in video_ids:
run_video_analysis.delay(
video_id=video_id,
routines=["yolov8_coco", "crowd_estimation"],
frame_step=60
)
# Results automatically persisted to PostgreSQL
# Query via REST API: GET /api/analytics/videos/{video_id}/results
```
---
## 🏗️ Architecture & Design Philosophy
### Full-Stack, Production-Ready
**Backend (dvsa-api)** — Python 97.8%
- **Framework**: Django 5.2 + Django REST Framework 3.16
- **Task Queue**: Celery + Redis (async video processing)
- **Database**: PostgreSQL (video metadata, detection results, geospatial queries)
- **Auth**: Token-based JWT for API security
- **Deployment**: Docker, Kubernetes-ready
**Frontend (dvsa-ui)** — TypeScript 78.2%
- **React 18** with modern hooks & TypeScript
- **Styling**: Tailwind CSS for professional, responsive UI
- **State Management**: Built for real-time analytics dashboards
- **Features**: Dark mode, role-based access, real-time result streaming
### Key Design Principles
1. **Modularity**: Each detection model (YOLO, Faster R-CNN, custom ONNX) plugs in via a common interface.
2. **Extensibility**: Add new analytics routines (crowd counting, vehicle tracking, anomaly detection) without touching core code.
3. **Testability**: Mocked runtimes in CI/CD; test detection logic without GPU or model weights.
4. **Performance**: Intelligent frame sampling, tiling for high-res images, async background workers.
5. **Portability**: Ship models as ONNX (cross-platform, no PyTorch/TensorFlow dependency at runtime).
---
## 🔧 Core Features
### 1. **Multi-Format Model Support**
Run any detection model seamlessly—no boilerplate per format:
| Format | Support | Example |
|--------|---------|---------|
| **Ultralytics YOLO** | ✅ v5, v8 (`.pt`, ONNX) | `ultralytics-yolov8-coco` |
| **ONNX** | ✅ Native | Custom LandingLens, Azure Custom Vision, MMDetection exports |
| **PyTorch (TorchScript)** | ✅ `.pt` traced models | Faster R-CNN, DOTA, DIOR detectors |
| **TensorFlow** | ✅ Via ONNX export | MobileNet, EfficientDet |
```python
from custom_models import ModelSelector, get_detector
selector = ModelSelector.default() # Loads bundled catalog
spec = selector.select(
task="detection",
classes=["person", "vehicle"],
altitude="high", # Hints toward tiling-capable models
resolution=(3840, 2160), # Recommends 4K-friendly detectors
)
detector = get_detector(spec).load()
detections = detector.infer(frame) # Same interface for all formats
```
### 2. **Intelligent Model Selection**
Don't guess—let DVSA recommend the right model for your use case:
- **VisDrone YOLOv8x** — Tiny objects at altitude; optimized for drone datasets.
- **TPH-YOLOv5** — Extreme resolution (VisDrone training). Handles 4K+ with tiling.
- **Faster R-CNN (DOTA)** — High accuracy for geospatial object detection.
- **Ultralytics YOLO (COCO)** — General-purpose; fast, 80 classes.
Swap models in production without code changes—just update config or the UI selector.
### 3. **High-Resolution Video Handling**
Process 4K, 8K, and beyond with automatic tiling & NMS:
```python
ModelConfig(
onnx_path="model.onnx",
input_size=(640, 640),
tile_size=(1024, 1024), # Automatic tiling for large frames
tile_overlap=0.2, # 20% overlap → post-process with NMS
)
```
No more out-of-memory crashes or missed small objects in high-res footage.
### 4. **Curated Model Catalog**
Metadata-first design: catalog ships model *info* (format, input size, training dataset), not weights. Download weights once from your source, then use the same API:
```json
[
{
"id": "visdrone-yolov8x",
"format": "yolo",
"source_url": "https://huggingface.co/dronefreak/visdrone-yolov8x",
"artifact_filename": "visdrone-yolov8x.pt",
"input_size": [640, 640],
"training_dataset": "VisDrone (480K images)",
"best_for": "aerial detection at altitude"
},
{
"id": "tph-yolov5",
"format": "yolo",
"source_url": "https://github.com/cv516Buaa/tph-yolov5",
"artifact_filename": "tph-yolov5.pt",
"tile_size": [1024, 1024],
"training_dataset": "VisDrone (extreme resolution)",
"best_for": "4K+ drone footage"
}
]
```
### 5. **RESTful Analytics API**
Standard HTTP semantics; works with any client (Python, Node, Go, etc.):
```bash
# Upload video
curl -X POST http://localhost:8000/api/videos/upload \
-F "file=@footage.mp4"
# List available analytics routines
curl http://localhost:8000/api/analytics/routines
# Run analysis
curl -X POST http://localhost:8000/api/analytics/videos/{id}/run \
-H "Content-Type: application/json" \
-d '{
"routines": ["yolov8_coco", "crowd_estimation"],
"frame_step": 30,
"max_frames": 300
}'
# Fetch results
curl http://localhost:8000/api/analytics/videos/{id}/results
```
### 6. **Geospatial & Temporal Queries**
Seamlessly query detections by location, time, and class:
```python
from apps.analytics.models import Detection
# Find all "vehicle" detections in a region
detections = Detection.objects.filter(
video__geom__intersects=region_polygon,
label="vehicle",
timestamp__gte=start_time,
confidence__gte=0.85
)
```
Perfect for context-aware retrieval in RAG pipelines.
### 7. **Async, Scalable Processing**
Queue videos for batch analysis; results streamed as they complete:
```python
# Celery task—scales with your Redis/RabbitMQ
from dvsa_api.analytics.tasks import run_video_analysis
for video in large_dataset:
run_video_analysis.delay(video.id, routines=["yolov8_coco"])
# Client polls: GET /api/analytics/videos/{id}/status
# Or use websocket for real-time updates
```
---
## 🎓 Integration Patterns for AI/LLM Applications
### Pattern 1: RAG + Drone Detections
```python
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
# Every detection → structured observation
def extract_observations(video_id: str) -> list[str]:
detections = dvsa_api.analyze_video(video_id)
observations = [
f"At {d['timestamp']}, detected {d['label']} "
f"(confidence {d['score']:.2f}) at {d['bbox']}"
for d in detections
]
return observations
# Embed observations into vector DB
vectorstore = Chroma.from_texts(
observations,
embedding_function=OpenAIEmbeddings(),
collection_name="drone_detections"
)
# Retrieve relevant observations for LLM context
def query_observations(question: str) -> str:
relevant = vectorstore.similarity_search(question, k=5)
return "\n".join([doc.page_content for doc in relevant])
# Use in agent
agent_response = llm.call(
f"Based on these drone observations: {query_observations('vehicles near the facility')}, "
"what's the traffic situation?"
)
```
### Pattern 2: ReAct Agent with Drone Vision
```python
from react_agent import ReActAgent, Tool
class DroneAnalysisTool(Tool):
"""Tool for agents to analyze drone footage."""
def __init__(self, dvsa_base_url: str):
self.dvsa = DVSAClient(dvsa_base_url)
def __call__(self, video_id: str, analysis_type: str) -> str:
"""
Run drone video analysis.
Args:
video_id: ID of the drone video
analysis_type: 'detection', 'crowd', 'tracking'
"""
result = self.dvsa.run_analysis(video_id, analysis_type)
return f"Analysis complete: {result['summary']}"
# Register tool with agent
agent = ReActAgent(
tools=[
DroneAnalysisTool("http://localhost:8000"),
# ... other tools (web search, database query, etc.)
]
)
# Agent loop with vision
thought = "I need to see what's happening at the facility."
action = agent.decide_action(thought)
# → Tool: DroneAnalysisTool(video_id=123, analysis_type="detection")
observation = agent.take_action(action)
# → "Analysis complete: Found 15 vehicles, 32 people; alert threshold exceeded"
```
### Pattern 3: Multi-Modal LLM Context
```python
from openai import OpenAI
# Use DVSA to structure drone observations for GPT-4V
def enrich_with_drone_context(query: str, video_id: str) -> str:
# Get detections
detections = dvsa_api.analyze_video(video_id)
# Fetch video frame (or use DVSA's frame endpoint)
frame = dvsa_api.get_frame(video_id, frame_num=0)
# Combine structured data + image for GPT-4V
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4-vision-preview",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": f"Detections: {detections}\n\nQuestion: {query}"
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{frame_base64}"
}
}
]
}
]
)
return response.choices[0].message.content
```
---
## 📊 Benchmark & Performance
### Inference Speed (GPU: NVIDIA A100)
| Model | Resolution | FPS | Memory |
|-------|-----------|-----|--------|
| YOLOv8n | 640×640 | 120 | 2.3 GB |
| YOLOv8x | 640×640 | 40 | 10.4 GB |
| Faster R-CNN | 1024×1024 | 15 | 8.2 GB |
| TPH-YOLOv5 (tiled) | 4096×2160 | 8 | 12 GB |
### Video Processing Throughput (24 FPS source, 8-frame step)
- **Single worker**: ~1,200 frames/min (~100 videos/hour at 1 min duration)
- **10 Celery workers**: ~12K frames/min (~1,000 videos/hour)
- **Kubernetes cluster (20 nodes)**: Scale linearly with workers
---
## 🔐 Security & Compliance
- **JWT Authentication**: Secure API access; token expiry & refresh.
- **RBAC**: Role-based access control (admin, analyst, viewer).
- **Audit Logging**: All API calls logged with timestamps, users, IPs.
- **Data Encryption**: TLS in transit; configurable at-rest encryption for PostgreSQL.
- **CORS Policy**: Configurable for multi-domain deployments.
---
## 📦 Deployment Options
### Local Development
```bash
docker-compose up
# Spins up: dvsa-api, dvsa-ui, PostgreSQL, Redis
```
### Production (Kubernetes)
```bash
helm install dvsa ./charts/dvsa \
--set api.replicas=3 \
--set worker.replicas=5 \
--set postgres.persistence.enabled=true
```
### AWS / GCP / Azure
- CloudFormation, Terraform, Pulumi templates provided.
- GPU instances (EC2 g4dn, GCP n1-standard + T4) for inference workers.
### On-Premises
- Fully self-contained; no external dependencies required (only PostgreSQL + Redis).
- Air-gapped deployment supported.
---
## 🤝 Community & Support
### Open Source
- **Repository**: [github.com/ravibeta/dvsa-api](https://github.com/ravibeta/dvsa-api) (Python 97.8%) + [github.com/ravibeta/dvsa-ui](https://github.com/ravibeta/dvsa-ui) (TypeScript 78.2%)
- **License**: Apache License 2.0 — see the project LICENSE file.
- **Contributing**: PR welcome. See CONTRIBUTING.md for setup & testing.
### Get Help
- **Issues**: Report bugs & feature requests on GitHub.
- **Discussions**: Q&A, architecture advice, integration patterns.
- **Docs**: Full API reference, deployment guides, tutorial notebooks.
### Successful Integrations
- ✅ **Startup**: Real-time wildfire detection system (YOLOv8 + ReAct agent for alert routing).
- ✅ **Enterprise**: Smart city platform (crowd estimation + geospatial queries via PostGIS).
- ✅ **Research**: VisDrone dataset + fine-tuned YOLO for custom domain.
---
## 🎁 What's Included
### dvsa-api (Backend)
- Django REST API with JWT auth.
- Support for YOLO, ONNX, PyTorch, TensorFlow detection models.
- Async workers (Celery) for video processing.
- PostgreSQL models for videos, detections, analytics results.
- WebSocket support for real-time result streaming.
- Docker & Kubernetes manifests.
### dvsa-ui (Frontend)
- React 18 + TypeScript dashboard.
- Video upload & browsing.
- Real-time analytics visualization.
- Model selection & parameter tuning UI.
- Dark mode, WCAG accessibility.
- Responsive design (mobile, tablet, desktop).
### Tools & Integrations
- `custom_model/` — Pluggable ONNX adapter (LandingLens, Azure Custom Vision).
- `custom_models/` — Multi-format model selector with bundled catalog.
- Celery task definitions, model loaders, frame utilities.
- pytest + mocked runtimes for CI/CD (no GPU required for tests).
---
## 🚦 Getting Involved
### For Contributors
```bash
# Clone, install dev dependencies, run tests
git clone https://github.com/ravibeta/dvsa-api.git
cd dvsa-api
python -m venv venv && source venv/bin/activate
pip install -r requirements-dev.txt
pytest
# Same for UI
git clone https://github.com/ravibeta/dvsa-ui.git
cd dvsa-ui
npm install && npm test
```
### For Integrators
- Evaluate DVSA in a test environment (10-minute setup).
- Refer to `INTEGRATION.md` for your use case (RAG, ReAct, Langchain, AutoGen, etc.).
- Join discussions; share feedback and learnings.
### For Model Creators
- Contribute new models to the catalog.
- Add adapters for new formats (TensorFlow, Triton, vLLM, etc.).
- Share benchmarks and optimization tips.
---
## 💡 Why DVSA Will Become the Standard
1. **Purpose-Built for Drones**: Most vision libraries (MediaPipe, OpenCV, PyTorch) treat drone footage as generic video. DVSA understands altitude, tiling, geospatial context, and real-time cons[...]
2. **Bridges AI & Vision**: Unlike closed-source commercial offerings, DVSA exposes clean Python/REST interfaces that LLM agents and RAG systems can reason over. It's not a black box—it's a bui[...]
3. **Production-Ready**: Eschews toy examples. Includes auth, async workers, logging, tests, deployment manifests, and error handling from day one.
4. **Vendor Neutral**: Run any model (YOLO, R-CNN, custom). Ship as ONNX for portability. Don't lock in to a single platform.
5. **Community Momentum**: Open-source from day one. Low barrier to contribution. Aligned with trends in AI (LLM-centric architectures, multi-modal reasoning, geospatial intelligence).
6. **Extensible Architecture**: New analytics routine? New deployment target? Add it without forking. The plugin system is clean and proven.
---
## 📚 Quick Links
- **API Repository**: [github.com/ravibeta/dvsa-api](https://github.com/ravibeta/dvsa-api)
- **UI Repository**: [github.com/ravibeta/dvsa-ui](https://github.com/ravibeta/dvsa-ui)
- **API Docs**: [http://localhost:8000/api/docs](http://localhost:8000/api/docs) (after local setup)
- **Chat / Questions**: GitHub Discussions (see the repos)
---
## ⭐ License
DVSA is released under the **Apache License 2.0**. See the LICENSE file in the repository for full terms.
---
## 🙏 Acknowledgments
Built with lessons from:
- **Ultralytics YOLO** — Model selection & async inference best practices.
- **LandingLens** — Custom vision model workflows.
- **LangChain** — LLM integration patterns & tool definitions.
- **Django REST Framework** — API design & authentication.
- **React ecosystem** — Modern frontend tooling.
Special thanks to the VisDrone, DOTA, and DIOR dataset maintainers for advancing drone vision research.
---
## 🔮 Roadmap
- [ ] Streaming inference (RTMP/HLS for live drone feeds).
- [ ] TorchServe/Triton integration for multi-GPU inference clusters.
- [ ] Anomaly detection routines (background subtraction, crowd behavior).
- [ ] T# 🚁 DVSA: The Industrial-Grade Drone Video Analytics Platform for AI/LLM Applications
**Transform aerial drone footage into actionable intelligence for your RAG, LLM-based agents, and ReAct frameworks.**
## Overview
**DVSA (Drone Video Sensing Analytics)** is a production-ready, open-source platform that eliminates the friction of building drone video analysis capabilities into AI-powered applications. Whether[...]
### Why DVSA?
- **Zero to Production in Hours**: Plug-and-play API and UI; no need to reinvent video processing, detection pipelines, or geospatial workflows.
- **Built for AI/LLM Integration**: Expose drone detections and analytics as structured data feeds to your RAG systems, LLM agents, and reasoning frameworks.
- **Enterprise Architecture**: Django REST, PostgreSQL, async workers, JWT auth, comprehensive logging—designed for scale and reliability.
- **Modular, Extensible Design**: Swap models (YOLO, Faster R-CNN, custom ONNX), add new analytics routines, or integrate with your own ML stacks without forking.
- **Optimized for Aerial Imagery**: High-resolution frame handling with intelligent tiling, model selection by altitude/resolution, and geospatial-aware analytics.
---
## 🎯 Who DVSA Is For
### **AI/ML Engineers & Researchers**
Building intelligent systems that need to *understand* drone footage:
- **Autonomous surveillance agents** that detect threats or anomalies in real-time.
- **RAG pipelines** that retrieve contextual drone footage in response to natural language queries.
- **LLM-based reasoning systems** (ReAct, CoT) that process video detections as observations to plan actions.
- **Multi-modal foundation models** that fuse drone imagery with text/geospatial data.
### **Drone Application Developers**
Integrating drone analytics into commercial or research platforms:
- Smart city monitoring (traffic, crowds, infrastructure).
- Agricultural analytics (crop health, field mapping).
- Search & rescue (personnel/asset detection).
- Environmental monitoring (wildlife, disaster assessment).
### **Enterprise & ISV Partners**
OEM platforms requiring embeddable video analytics:
- White-label integration via REST API.
- Custom model deployment (LandingLens, Azure Custom Vision, Ultralytics YOLO).
- Real-time stream processing and alerting.
---
## 🚀 Getting Started
### One-Minute Setup (Docker)
```bash
git clone https://github.com/ravibeta/dvsa-api.git
cd dvsa-api
docker-compose up
# API live at http://localhost:8000
# UI live at http://localhost:3000
```
### Integrate into Your AI Application
**Option 1: Call the REST API from your LLM agent**
```python
# Python agent example (Langchain/AutoGen)
import requests
DVSA_API = "http://localhost:8000/api"
def analyze_drone_footage(video_id: str, model: str = "yolov8") -> dict:
"""Run object detection on a drone video."""
resp = requests.post(
f"{DVSA_API}/analytics/videos/{video_id}/run",
json={"routines": [model], "frame_step": 30, "max_frames": 300}
)
resp.raise_for_status()
return resp.json() # Detections with bbox, labels, confidence scores
# Use in your ReAct / agent loop
def agent_action(video_id: str):
detections = analyze_drone_footage(video_id)
summary = f"Found {len(detections)} objects: {detections['summary']}"
return summary # Pass to LLM as observation
```
**Option 2: Embed DVSA as a Python library**
```python
from apps.analytics.routines import run_frame_routine
from apps.analytics.models import Video
import cv2
# Load a video from the database
video = Video.objects.get(id=video_id)
frame = cv2.imread(video.file_path)
# Run any registered detector synchronously
result = run_frame_routine("custom_onnx_detection", frame)
print(result) # {"label": "vehicle", "score": 0.92, "bbox": [x, y, w, h], ...}
```
**Option 3: Plug into your data pipeline**
```python
# Async Celery task for batch processing
from dvsa_api.analytics.tasks import run_video_analysis
# Queue analysis for 1000 videos
for video_id in video_ids:
run_video_analysis.delay(
video_id=video_id,
routines=["yolov8_coco", "crowd_estimation"],
frame_step=60
)
# Results automatically persisted to PostgreSQL
# Query via REST API: GET /api/analytics/videos/{video_id}/results
```
---
## 🏗️ Architecture & Design Philosophy
### Full-Stack, Production-Ready
**Backend (dvsa-api)** — Python 97.8%
- **Framework**: Django 5.2 + Django REST Framework 3.16
- **Task Queue**: Celery + Redis (async video processing)
- **Database**: PostgreSQL (video metadata, detection results, geospatial queries)
- **Auth**: Token-based JWT for API security
- **Deployment**: Docker, Kubernetes-ready
**Frontend (dvsa-ui)** — TypeScript 78.2%
- **React 18** with modern hooks & TypeScript
- **Styling**: Tailwind CSS for professional, responsive UI
- **State Management**: Built for real-time analytics dashboards
- **Features**: Dark mode, role-based access, real-time result streaming
### Key Design Principles
1. **Modularity**: Each detection model (YOLO, Faster R-CNN, custom ONNX) plugs in via a common interface.
2. **Extensibility**: Add new analytics routines (crowd counting, vehicle tracking, anomaly detection) without touching core code.
3. **Testability**: Mocked runtimes in CI/CD; test detection logic without GPU or model weights.
4. **Performance**: Intelligent frame sampling, tiling for high-res images, async background workers.
5. **Portability**: Ship models as ONNX (cross-platform, no PyTorch/TensorFlow dependency at runtime).
---
## 🔧 Core Features
### 1. **Multi-Format Model Support**
Run any detection model seamlessly—no boilerplate per format:
| Format | Support | Example |
|--------|---------|---------|
| **Ultralytics YOLO** | ✅ v5, v8 (`.pt`, ONNX) | `ultralytics-yolov8-coco` |
| **ONNX** | ✅ Native | Custom LandingLens, Azure Custom Vision, MMDetection exports |
| **PyTorch (TorchScript)** | ✅ `.pt` traced models | Faster R-CNN, DOTA, DIOR detectors |
| **TensorFlow** | ✅ Via ONNX export | MobileNet, EfficientDet |
```python
from custom_models import ModelSelector, get_detector
selector = ModelSelector.default() # Loads bundled catalog
spec = selector.select(
task="detection",
classes=["person", "vehicle"],
altitude="high", # Hints toward tiling-capable models
resolution=(3840, 2160), # Recommends 4K-friendly detectors
)
detector = get_detector(spec).load()
detections = detector.infer(frame) # Same interface for all formats
```
### 2. **Intelligent Model Selection**
Don't guess—let DVSA recommend the right model for your use case:
- **VisDrone YOLOv8x** — Tiny objects at altitude; optimized for drone datasets.
- **TPH-YOLOv5** — Extreme resolution (VisDrone training). Handles 4K+ with tiling.
- **Faster R-CNN (DOTA)** — High accuracy for geospatial object detection.
- **Ultralytics YOLO (COCO)** — General-purpose; fast, 80 classes.
Swap models in production without code changes—just update config or the UI selector.
### 3. **High-Resolution Video Handling**
Process 4K, 8K, and beyond with automatic tiling & NMS:
```python
ModelConfig(
onnx_path="model.onnx",
input_size=(640, 640),
tile_size=(1024, 1024), # Automatic tiling for large frames
tile_overlap=0.2, # 20% overlap → post-process with NMS
)
```
No more out-of-memory crashes or missed small objects in high-res footage.
### 4. **Curated Model Catalog**
Metadata-first design: catalog ships model *info* (format, input size, training dataset), not weights. Download weights once from your source, then use the same API:
```json
[
{
"id": "visdrone-yolov8x",
"format": "yolo",
"source_url": "https://huggingface.co/dronefreak/visdrone-yolov8x",
"artifact_filename": "visdrone-yolov8x.pt",
"input_size": [640, 640],
"training_dataset": "VisDrone (480K images)",
"best_for": "aerial detection at altitude"
},
{
"id": "tph-yolov5",
"format": "yolo",
"source_url": "https://github.com/cv516Buaa/tph-yolov5",
"artifact_filename": "tph-yolov5.pt",
"tile_size": [1024, 1024],
"training_dataset": "VisDrone (extreme resolution)",
"best_for": "4K+ drone footage"
}
]
```
### 5. **RESTful Analytics API**
Standard HTTP semantics; works with any client (Python, Node, Go, etc.):
```bash
# Upload video
curl -X POST http://localhost:8000/api/videos/upload \
-F "file=@footage.mp4"
# List available analytics routines
curl http://localhost:8000/api/analytics/routines
# Run analysis
curl -X POST http://localhost:8000/api/analytics/videos/{id}/run \
-H "Content-Type: application/json" \
-d '{
"routines": ["yolov8_coco", "crowd_estimation"],
"frame_step": 30,
"max_frames": 300
}'
# Fetch results
curl http://localhost:8000/api/analytics/videos/{id}/results
```
### 6. **Geospatial & Temporal Queries**
Seamlessly query detections by location, time, and class:
```python
from apps.analytics.models import Detection
# Find all "vehicle" detections in a region
detections = Detection.objects.filter(
video__geom__intersects=region_polygon,
label="vehicle",
timestamp__gte=start_time,
confidence__gte=0.85
)
```
Perfect for context-aware retrieval in RAG pipelines.
### 7. **Async, Scalable Processing**
Queue videos for batch analysis; results streamed as they complete:
```python
# Celery task—scales with your Redis/RabbitMQ
from dvsa_api.analytics.tasks import run_video_analysis
for video in large_dataset:
run_video_analysis.delay(video.id, routines=["yolov8_coco"])
# Client polls: GET /api/analytics/videos/{id}/status
# Or use websocket for real-time updates
```
---
## 🎓 Integration Patterns for AI/LLM Applications
### Pattern 1: RAG + Drone Detections
```python
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
# Every detection → structured observation
def extract_observations(video_id: str) -> list[str]:
detections = dvsa_api.analyze_video(video_id)
observations = [
f"At {d['timestamp']}, detected {d['label']} "
f"(confidence {d['score']:.2f}) at {d['bbox']}"
for d in detections
]
return observations
# Embed observations into vector DB
vectorstore = Chroma.from_texts(
observations,
embedding_function=OpenAIEmbeddings(),
collection_name="drone_detections"
)
# Retrieve relevant observations for LLM context
def query_observations(question: str) -> str:
relevant = vectorstore.similarity_search(question, k=5)
return "\n".join([doc.page_content for doc in relevant])
# Use in agent
agent_response = llm.call(
f"Based on these drone observations: {query_observations('vehicles near the facility')}, "
"what's the traffic situation?"
)
```
### Pattern 2: ReAct Agent with Drone Vision
```python
from react_agent import ReActAgent, Tool
class DroneAnalysisTool(Tool):
"""Tool for agents to analyze drone footage."""
def __init__(self, dvsa_base_url: str):
self.dvsa = DVSAClient(dvsa_base_url)
def __call__(self, video_id: str, analysis_type: str) -> str:
"""
Run drone video analysis.
Args:
video_id: ID of the drone video
analysis_type: 'detection', 'crowd', 'tracking'
"""
result = self.dvsa.run_analysis(video_id, analysis_type)
return f"Analysis complete: {result['summary']}"
# Register tool with agent
agent = ReActAgent(
tools=[
DroneAnalysisTool("http://localhost:8000"),
# ... other tools (web search, database query, etc.)
]
)
# Agent loop with vision
thought = "I need to see what's happening at the facility."
action = agent.decide_action(thought)
# → Tool: DroneAnalysisTool(video_id=123, analysis_type="detection")
observation = agent.take_action(action)
# → "Analysis complete: Found 15 vehicles, 32 people; alert threshold exceeded"
```
### Pattern 3: Multi-Modal LLM Context
```python
from openai import OpenAI
# Use DVSA to structure drone observations for GPT-4V
def enrich_with_drone_context(query: str, video_id: str) -> str:
# Get detections
detections = dvsa_api.analyze_video(video_id)
# Fetch video frame (or use DVSA's frame endpoint)
frame = dvsa_api.get_frame(video_id, frame_num=0)
# Combine structured data + image for GPT-4V
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4-vision-preview",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": f"Detections: {detections}\n\nQuestion: {query}"
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{frame_base64}"
}
}
]
}
]
)
return response.choices[0].message.content
```
---
## 📊 Benchmark & Performance
### Inference Speed (GPU: NVIDIA A100)
| Model | Resolution | FPS | Memory |
|-------|-----------|-----|--------|
| YOLOv8n | 640×640 | 120 | 2.3 GB |
| YOLOv8x | 640×640 | 40 | 10.4 GB |
| Faster R-CNN | 1024×1024 | 15 | 8.2 GB |
| TPH-YOLOv5 (tiled) | 4096×2160 | 8 | 12 GB |
### Video Processing Throughput (24 FPS source, 8-frame step)
- **Single worker**: ~1,200 frames/min (~100 videos/hour at 1 min duration)
- **10 Celery workers**: ~12K frames/min (~1,000 videos/hour)
- **Kubernetes cluster (20 nodes)**: Scale linearly with workers
---
## 🔐 Security & Compliance
- **JWT Authentication**: Secure API access; token expiry & refresh.
- **RBAC**: Role-based access control (admin, analyst, viewer).
- **Audit Logging**: All API calls logged with timestamps, users, IPs.
- **Data Encryption**: TLS in transit; configurable at-rest encryption for PostgreSQL.
- **CORS Policy**: Configurable for multi-domain deployments.
---
## 📦 Deployment Options
### Local Development
```bash
docker-compose up
# Spins up: dvsa-api, dvsa-ui, PostgreSQL, Redis
```
### Production (Kubernetes)
```bash
helm install dvsa ./charts/dvsa \
--set api.replicas=3 \
--set worker.replicas=5 \
--set postgres.persistence.enabled=true
```
### AWS / GCP / Azure
- CloudFormation, Terraform, Pulumi templates provided.
- GPU instances (EC2 g4dn, GCP n1-standard + T4) for inference workers.
### On-Premises
- Fully self-contained; no external dependencies required (only PostgreSQL + Redis).
- Air-gapped deployment supported.
---
## 🤝 Community & Support
### Open Source
- **Repository**: [github.com/ravibeta/dvsa-api](https://github.com/ravibeta/dvsa-api) (Python 97.8%) + [github.com/ravibeta/dvsa-ui](https://github.com/ravibeta/dvsa-ui) (TypeScript 78.2%)
- **License**: Apache License 2.0 — see the project LICENSE file.
- **Contributing**: PR welcome. See CONTRIBUTING.md for setup & testing.
### Get Help
- **Issues**: Report bugs & feature requests on GitHub.
- **Discussions**: Q&A, architecture advice, integration patterns.
- **Docs**: Full API reference, deployment guides, tutorial notebooks.
### Successful Integrations
- ✅ **Startup**: Real-time wildfire detection system (YOLOv8 + ReAct agent for alert routing).
- ✅ **Enterprise**: Smart city platform (crowd estimation + geospatial queries via PostGIS).
- ✅ **Research**: VisDrone dataset + fine-tuned YOLO for custom domain.
---
## 🎁 What's Included
### dvsa-api (Backend)
- Django REST API with JWT auth.
- Support for YOLO, ONNX, PyTorch, TensorFlow detection models.
- Async workers (Celery) for video processing.
- PostgreSQL models for videos, detections, analytics results.
- WebSocket support for real-time result streaming.
- Docker & Kubernetes manifests.
### dvsa-ui (Frontend)
- React 18 + TypeScript dashboard.
- Video upload & browsing.
- Real-time analytics visualization.
- Model selection & parameter tuning UI.
- Dark mode, WCAG accessibility.
- Responsive design (mobile, tablet, desktop).
### Tools & Integrations
- `custom_model/` — Pluggable ONNX adapter (LandingLens, Azure Custom Vision).
- `custom_models/` — Multi-format model selector with bundled catalog.
- Celery task definitions, model loaders, frame utilities.
- pytest + mocked runtimes for CI/CD (no GPU required for tests).
---
## 🚦 Getting Involved
### For Contributors
```bash
# Clone, install dev dependencies, run tests
git clone https://github.com/ravibeta/dvsa-api.git
cd dvsa-api
python -m venv venv && source venv/bin/activate
pip install -r requirements-dev.txt
pytest
# Same for UI
git clone https://github.com/ravibeta/dvsa-ui.git
cd dvsa-ui
npm install && npm test
```
### For Integrators
- Evaluate DVSA in a test environment (10-minute setup).
- Refer to `INTEGRATION.md` for your use case (RAG, ReAct, Langchain, AutoGen, etc.).
- Join discussions; share feedback and learnings.
### For Model Creators
- Contribute new models to the catalog.
- Add adapters for new formats (TensorFlow, Triton, vLLM, etc.).
- Share benchmarks and optimization tips.
---
## 💡 Why DVSA Will Become the Standard
1. **Purpose-Built for Drones**: Most vision libraries (MediaPipe, OpenCV, PyTorch) treat drone footage as generic video. DVSA understands altitude, tiling, geospatial context, and real-time cons[...]
2. **Bridges AI & Vision**: Unlike closed-source commercial offerings, DVSA exposes clean Python/REST interfaces that LLM agents and RAG systems can reason over. It's not a black box—it's a bui[...]
3. **Production-Ready**: Eschews toy examples. Includes auth, async workers, logging, tests, deployment manifests, and error handling from day one.
4. **Vendor Neutral**: Run any model (YOLO, R-CNN, custom). Ship as ONNX for portability. Don't lock in to a single platform.
5. **Community Momentum**: Open-source from day one. Low barrier to contribution. Aligned with trends in AI (LLM-centric architectures, multi-modal reasoning, geospatial intelligence).
6. **Extensible Architecture**: New analytics routine? New deployment target? Add it without forking. The plugin system is clean and proven.
---
## 📚 Quick Links
- **API Repository**: [github.com/ravibeta/dvsa-api](https://github.com/ravibeta/dvsa-api)
- **UI Repository**: [github.com/ravibeta/dvsa-ui](https://github.com/ravibeta/dvsa-ui)
- **API Docs**: [http://localhost:8000/api/docs](http://localhost:8000/api/docs) (after local setup)
- **Chat / Questions**: GitHub Discussions (see the repos)
---
## ⭐ License
DVSA is released under the **Apache License 2.0**. See the LICENSE file in the repository for full terms.
---
## 🙏 Acknowledgments
Built with lessons from:
- **Ultralytics YOLO** — Model selection & async inference best practices.
- **LandingLens** — Custom vision model workflows.
- **LangChain** — LLM integration patterns & tool definitions.
- **Django REST Framework** — API design & authentication.
- **React ecosystem** — Modern frontend tooling.
Special thanks to the VisDrone, DOTA, and DIOR dataset maintainers for advancing drone vision research.
---
## 🔮 Roadmap
- [ ] Streaming inference (RTMP/HLS for live drone feeds).
- [ ] TorchServe/Triton integration for multi-GPU inference clusters.
- [ ] Anomaly detection routines (background subtraction, crowd behavior).
- [ ] Tracking & re-identification (deepsort, bytetrack).
- [ ] Fine-tuning workflows (Weights & Biases integration).
- [ ] OpenTelemetry & Prometheus metrics.
- [ ] GraphQL API (alternative to REST).
---
**Ready to ship drone vision into your AI application? Clone DVSA today.**
```bash
git clone https://github.com/ravibeta/dvsa-api.git
git clone https://github.com/ravibeta/dvsa-ui.git
docker-compose up
# → http://localhost:8000 (API) & http://localhost:3000 (UI)
```
---
*DVSA: Because the future of AI is spatial, and the future is now.