Monday, April 28, 2025

 Image processing is made easy with platforms like landing.ai

As an example, the following is an application that counts cars in drone images. The dataset is based on 512x512 resolution images of highways and is annotated in the Pascal VoC format. The model is hosted and usable with a sample web-request as follows:

from PIL import Image

from landingai.predict import Predictor

# Enter your API Key

endpoint_id = "11cb6c44-3b6a-4b47-bac9-031826bc80ea"

api_key = "YOUR_API_KEY"

# Load your image

image = Image.open("image.png")

# Run inference

predictor = Predictor(endpoint_id, api_key=api_key)

predictions = predictor.predict(image)


And it can even be requested with agentic ai framework as follows:

import requests

url = "https://api.va.landing.ai/v1/tools/agentic-object-detection"

files = {

  "image": open("{{path_to_image}}", "rb")

}

data = {

  "prompts": "{{prompt}}",

  "model": "agentic"

}

headers = {

  "Authorization": "Basic {{your_api_key}}"

}

response = requests.post(url, files=files, data=data, headers=headers)

print(response.json())


For context on DFCS drone video sensing platform, please check the references.


Sunday, April 27, 2025

 Some more illustrations for drone imagery processing:

def stable_groups(keypoints, groups, threshold):

    for kp in keypoints:

        matched = false

        for group in groups:

            mean_feature = get_mean_feature(group)

            recent_pixel = get_recent_pixel(group)

            if kd.feature - mean_feature < threshold and abs(lucas_kanade_optical_flow(recent_pixel)-kp.pixel) < threshold:

               group.add(kp)

               matched = true

               break

        if matched == false:

            groups.add(create_group(kp))


def global_groups(stable_groups,global_groups, threshold):

    for stable_group in stable_groups:

        matched = false

        for global_group in global_groups:

            mean_feature = get_mean_feature(global_group)

            recent_pixel = get_recent_pixel(global_group)

            if get_mean_feature(stable_group) - mean_feature < threshold and delta_least_squares(stable_group,global_group):

               global_group.add(stable_group)

               matched = true

               break

        if matched == false:

            global_groups.add(create_global_group(stable_group))


def spherical_gps_to_position_n_orientation(gps, frame):

    return (d,x,d,y,h)


def camera_angle(keypoint, resolutionW, resolutionH, field_of_view):

       return arctan((x1 x (tan(field_of_view)/2) ) / (W/2)


def world_coordinates(keypoint, drone_frame):

    # solve these equations

    # 1.      (di.h - si.h).tan-theta-i-x = s.x - di.x,

    # 2.      (di.h - si.h).tan-theta-i-y = s.y - di.y

    # return (s.x,s.y, s.h)


Saturday, April 26, 2025

 This is illustration for sift feature extraction:

import cv2

sift = cv2.xfeatures2d.SIFT_create()

def compute_one(im):

 return sift.detectAndCompute(im, None)

def compute_sift(frames):

 print('get sift features')

 sift_features = [(None, None) for _ in frames]

 for frame_idx, im in enumerate(frames):

  if im is None or frame_idx % 3 != 0:

   continue

  print('... sift {}/{}'.format(frame_idx, len(frames)))

  keypoints, descs = compute_one(im)

  sift_features[frame_idx] = (keypoints, descs)

 return sift_features


Friday, April 25, 2025

 Drone Imagery Processing

We mentioned the drone video sensing platform DFCS to comprise of an image processor, an analytical engine and a drone router where the vision processor creates vectors for KeyPoint that are a tuple of pixel position and feature descriptor of the patch around the pixel which translates to world co-ordinates and time lapse information of that location. This article explains some of the tenets of the image processor.

One of the main requirements of the image processor is fast-frame alignment. Given that the images could be from any one of the units of the UAV swarm and from any position, the alignment of video frames is essential for subsequent tasks such as object detection and change-tracking. These three tasks are completed with the help of operators in an image pipeline fed with images from the drones’ sensors. The first flight around the region input by the user itself provides most of the survey of the landscape and brings in images from various vantage points. Most of the images are top-down imagery from this first video.

The frame alignment computes a mapping from each pixel to world-coordinates (longitude-latitude-height). The object detection and change-tracking encode the structured information obtained from the images. Machine Learning models extract information from the video. Frame alignment efficiently combines GPS and compass readings with image features. There is no need to compute or stash intermediary or output images from this processing. SIFT feature extraction derives KeyPoint in each video frame. Then KeyPoint are grouped together to describe the same world location such as a road divider or a chimney in two phases. Grouping involves creating stable groups in KeyPoint from multiple top-down images in a segment of the video from an aerial flight over the world location and then using that to create global groups by merging stable groups that describe the same world location. This inevitably leads to consolidation of all KeyPoint pertaining to a world location. Then the video frame is aligned by matching the SIFT KeyPoint computed in a single frame against the global groups, and this matching is used to estimate the drone’s position and orientation when it captured the frame. SIFT yields KeyPoint, frame alignment yields position and orientation and grouping yields KeyPoint corresponding to same world location. Grouping is iterative and initially starts with an empty set. For each frame, a KeyPoint is attempted to be matched with an existing group based on two conditions: 1. the similarity of the KeyPoint descriptor and the mean across descriptors in a group must lie below a threshold and 2. the pixel position of the most recent KeyPoint in the group when transformed via optical flow must fall close to that of the KeyPoint within a small threshold. Closeness is measured by Euclidean distance and the transformation is done with Lucas-Kanade method. If there is no match, the KeyPoint becomes a new group with a singleton member. Both existing and new groups are added to the global group.

After this aggregation into groups, GPS and compass readings are used to determine the world co-ordinates of stable groups. To merge stable groups into global groups, the co-ordinates of the global group is computed as the average across those of the stable groups and replace the optical flow constraint with the position estimate similarity constraint using the criteria of least-squares error to be below a threshold.


Thursday, April 24, 2025

 Leveraging a database of objects detected with Standard Query Operators to build rich drone video sensing applications.

We mentioned the drone video sensing platform DFCS to comprise of a vision processor, an analytical engine and a drone router where the vision processor creates vectors for keypoints that are a tuple of pixel position and feature descriptor of the patch around the pixel which translates to world co-ordinates and time lapse information of that location. While many of the questions can directly be answered with a search on this vector database or with multimodal search directly on the selected frames, we also leverage RAG by creating a database of detected objects which comes in very useful to search with the public reviews of those objects such as say parking spaces from the internet. The aim of the product database as a regular structured data source of all detected objects is that we can now leverage standard query operators to build rich uav swarm sensing applications.

For example,

-- My Position

declare @myposition geography = geography::STGeomFromText('POINT(-0.2173896258649289, 51.484376146936256)' 4326)

-- Get Embeddings from OpenAI

declare @e varbinary(8000);

exec dbo.get_embeddings

@model = 'text-embedding-3-small'

@text = 'a place to park a car on Thursday 1-3 pm GMT',

@embedding = @e output;

with cte as

(

select

e.review_id,

vector_distance('cosine', embedding, @e) as distance

from

dbo.review_embeddings e

)

select top(10)

b.id as business_id,

b.name,

r.id as review_id,

r.stars,

@myposition.STDistance(geo_location) as geo_distance,

1-e.distance as similarity

from

cte e

inner join

dbo.reviews r on e.review_id = r.id

inner join

dbo.business b on r.business_id = b.id

where

b.city = 'London'

and

@myposition.STDistance(geo_location) < 5000 -- 5 km

and

regexp_like(cast(b.categories as varchar(1000)), 'Parking|Street')

and

r.stars >= 4

 and

b.reviews > 30

and

json_value(b.custom_attributes, '$."metered"') = 'yes'

order by

distance

go

The above direct SQL query on the database combined with built-in vector search allows a traditional web application to be created or the application can query a chatbot with a system message as “You are an AI assistant that helps people find parking. Give as many details as possible about each parking space such as price. Whenever you respond, please format your answer to make it readable including bullet points.” to define the AI's personality, tone and capabilities and leverage the detected objects database for Retrieval Augmented Generation.


Wednesday, April 23, 2025

 Waypoint selection strategies

The design, development and test of the waypoint selection and trajectory forming algorithm was discussed with the assumption that the users provide a geographic region that they are interested in observing. The region is then divided into a grid of cells under a user configurable cell size. Then acquiring information on the reachability of cells from one another, we create a graph represented with cells as nodes and the adjacencies as edges. This helps us determine waypoints as the set of nodes to select in a topographical sort between source to destination. One of the helper libraries for the implementation, therefore, involves the following graph object.

class Vertex(object):

    def __init__(self, id, point):

        self.id = id

        self.point = point

        self.in_edges = []

        self.out_edges = []

    def _neighbors(self):

        n = {}

        for edge in self.in_edges:

            n[edge.src] = edge

        for edge in self.out_edges:

            n[edge.dst] = edge

        return n

    def neighbors(self):

        return self._neighbors().keys()

    def __repr__(self):

        return 'Vertex({}, {}, {} in {} out)'.format(self.id, self.point, len(self.in_edges), len(self.out_edges))

class Edge(object):

    def __init__(self, id, src, dst):

        self.id = id

        self.src = src

        self.dst = dst

    def bounds(self):

        return self.src.point.bounds().extend(self.dst.point)

    def segment(self):

        return geom.Segment(self.src.point, self.dst.point)

    def closest_pos(self, point):

        p = self.segment().project(point)

        return EdgePos(self, p.distance(self.src.point))

    def is_opposite(self, edge):

        return edge.src == self.dst and edge.dst == self.src

    def get_opposite_edge(self):

        for edge in self.dst.out_edges:

            if self.is_opposite(edge):

                return edge

        return None

    def is_adjacent(self, edge):

        return edge.src == self.src or edge.src == self.dst or edge.dst == self.src or edge.dst == self.dst

    def orig_id(self):

        if hasattr(self, 'orig_edge_id'):

            return self.orig_edge_id

        else:

            return self.id


Tuesday, April 22, 2025

 SIFT feature extraction for drone imageries

SIFT, or Scale-Invariant Feature Transform, is a powerful algorithm used in computer vision for detecting, describing, and matching local features in images. SIFT is designed to identify features that remain consistent across changes in scale, rotation, and illumination. It is applied to drone imageries to compute keypoints in each video frame. A keypoint is a tuple of a pixel position and a feature descriptor that describes the image in a patch around that pixel - a vector representation of the local image region. SIFT matches features between images by comparing their descriptors using metrics like Euclidean distance. For every video frame, SIFT yields a set of keypoints.

The implementation to get sift features is as follows:

import cv2

sift = cv2.xfeatures2d.SIFT_create()

def compute_one(im):

        return sift.detectAndCompute(im, None)

def compute_sift(frames):

        print('get sift features')

        sift_features = [(None, None) for _ in frames]

        for frame_idx, im in enumerate(frames):

            if im is None or frame_idx % 3 != 0:

                continue

            print('... sift {}/{}'.format(frame_idx, len(frames)))

            keypoints, descs = compute_one(im)

            sift_features[frame_idx] = (keypoints, descs)

        return sift_features