Saturday, December 20, 2025

 Many of the drone vision analytics queries are about objects located in a scene. For example, a search for a “parking garage” in a scene should yield a result with a clipped image showing the garage.  

As a multimodal search, this does not always accurately result in the correct answer but a few techniques can help. This article list those. 

  1. When the scenes are vectorized frame by frame, they could also be analyzed to detect as many objects as possible along with their bounding boxes and saved with the scenes as documents with id, vector, captions, title, location, bounding box and tags. 

  1. The search over these accumulated scenes and objects can make use of various search options to narrow down the search. For example: 

  1. Create a vector from the text: 

search_text = "parking garage" 

vector_query = VectorizableTextQuery(text=search_text, exhaustive=True, k_nearest_neighbors=50, fields="vector", weight=0.5) 

results = dest_search_client.search( 

    search_text=search_text, 

    vector_queries=[vector_query], 

    query_type=QueryType.SEMANTIC, 

    select=["id", "description","vector"], 

    filter = f"description ne null and search.ismatch('{search_text}', 'description')", 

    semantic_configuration_name="mysemantic", 

    query_caption=QueryCaptionType.EXTRACTIVE, 

    query_answer=QueryAnswerType.EXTRACTIVE, 

    top=10, 

) 

  1. use semantic configuration 

  1. Semantic configuration leverages the text based content in the fields such as title, description and tags for keyword and semantic search. 

  1. Specify the Hierarchical Navigable Small World (HNSW) search or Exhaustive KNN search as appropriate. The differences are that HNSW has high accuracy and low latency but might miss neighbors while exhaustive counts all neighbours at higher cost. Usually with large datasets, HNSW performs better 

  1. filter the results: 

  1. You can always leverage the text associated with the images to narrow down your results. 

  1. Even if the match is not at the top of the list, retrieving ten results as tensors can still be used in a subsequent clustering to find the centroid. 

These are some of the tips to make the results of a multimodal search more deterministic and high quality on a scale of 1 to 5. 

Friday, December 19, 2025

 Test queries for DVSA1 agentic-retrieval pipeline:

1. Bounding box: (Metrics captured: # tokens used, AI quality - scale of 1 to 5)

Prompt: You are a vision-language assistant. Given an image and a question about locating or describing an object, give its bounding box. Return only the bounding box coordinates in the format: <bbox>[[x, y, w, h],[x, y, w, h]...]</bbox> with the point of reference as the bottom left corner of the image. Do not include extra text or reasoning or ask the user for more information.

Queries:

a. Give the bounding box for the green street crossing sign for bicycles at a street intersection.

b. Give the bounding box for the only red car in the image.

c. Give the bounding box for a building with circular roof structure.

d. Give the bounding box for a parking lot with available space.

e. Give the bounding box for a red car in this sequence of images.

f. Give the bounding box for a roof with solar panels in this image.

2. Color: (Metrics captured: # tokens used, AI quality - scale of 1 to 5)

Prompt: You are a vision-language assistant. Given a scene as an image and a multiple-choice question about an object, select the best answer. Do not include extra text or reasoning or ask the user for more information.

Queries:

a. What color is the largest paved motor road in the given image? A. dark brown, B. tan. C. dark gray, D. black

b. What color is the car in the center of the image? A. Red B. White C. Black D. Green

c. What color is the building dividing the street? A. Blue. B. Teal. C. Patina. D. Green.

d. What color is the most common among the cars in the top storey of this parking lot? A. Red B. White. C. Black D. Green

e. What color is the dedicated lane for bicycles in this image? A. Blue. B. Black. C. White. D. Green

f. What color are the windows of this multi-storeyed building in the bottom left of this image? A. Black B. Blue C. Brown D. Green

3. Counting: (Metrics captured: # tokens used, AI quality - scale of 1 to 5)

Prompt: You are a vision-language assistant. Given a scene as an image and an object, count the number of objects in the scenes. Return only the count in this format: {number}. Do not include extra text or reasoning or ask the user for more information.

Queries:

a. How many cars are there in this image?

b. How many buildings with circular roof structure?

c. How many available parking spaces are there in the parking lot on the right side of the image?

d. How many trees are there in this image?

e. How many cars are crossing the street intersections in this image?

f. How many pedestrians are in this image?

4. Distance: (Metrics captured: # tokens used, AI quality - scale of 1 to 5)

Prompt: You are a vision-language assistant. Given a scene as an image and an object, count the number of objects in the scenes. Return only the count in this format: {number}. Do not include extra text or reasoning or ask the user for more information.

Queries:

a. Which is farthest from me: tree, building, car, sedan, parking lot, street crossing?

b. Which is closest to me: tree, building, car, sedan, parking lot, street crossing?

c. which is closer to the building with a circular roof structure: parking lot, street splitting?

d. Which is closer to me: river, street intersection, parking lot, trees?

e. Which is closer to me: building with red roof or building with green roof?

f. Which is bigger: the park with trees or the building next to it?

5. Free space: (Metrics captured: # tokens used, AI quality - scale of 1 to 5)

Prompt: You are a vision-language assistant. Given a scene as an image and the location near an object in the scene, indicate a free space region as a set of (x,y) pixel co-ordinates with the bottom left of the scene as the point of reference. Return this list of co-ordinates. Do not include extra text or reasoning or ask the user for more information.

Queries:

a. Find the free space on the roofs of buildings that do not have any structures.

b. Find the free space for parking a car in a parking lot.

c. Find the free space for parking a sedate along the street curb.

d. Find the free space for parking a large semitrailer.

e. Find the free space in the lot occupied by a building with a hollow circular structure protruding from the roof.

f. Find the free space along the direction of traffic at the street split by a building with circular dome.

6. Function:(Metrics captured: # tokens used, AI quality - scale of 1 to 5)

Prompt: You are a vision-language assistant. Given a scene as an image and the location near an object in the scene or a set of (x,y) pixel co-ordinates with the bottom left of the scene as the point of reference, indicate its function or purpose. Do not include reasoning or ask the user for more information.

Queries:

a. What is the ground between buildings on the left side of the road useful for?

b. What is the purpose of the green lane between the road and curb?

c. What is the purpose of the large white objects parked by the side of the road?

d. What is the nearest shelter for a pedestrian crossing the street when it rains?

e. What is the object in the scene that indicates the proximity of a social and commercial place such as a market or mall?

f. What are some of the co-ordinates in the scene where traffic can arrive at a body of water?

7. Height:(Metrics captured: # tokens used, AI quality - scale of 1 to 5)

Prompt: You are a vision-language assistant. Given a scene as an image and the description of an objects in the scene or a set of (x,y) pixel co-ordinates with the bottom left of the scene as the point of reference, determine the object with the relative elevation that matches the query. Do not include reasoning or ask the user for more information.

Queries:

a. which is higher between the river and the car on the road adjacent to the river?

b. which is higher between the building with the solar panels on the roof or the building to the leftmost of it?

c. which is higher between the building roof and the tree tops next to it?

d. which is lower between the object at the bottom right of the scene or the park next to it?

e. which is lower between the following object categories: river, street, vehicles, buildings and trees?

f. which is higher between the buildings on the left of the scene and the buildings on the right of the scene?

8. Landing: You are a UAV (drone) landing safety advisor analyzing a low-altitude aerial image. Provide a comprehensive landing safety assessment in the following JSON format with key landing_feasibility as one of SAFE or CAUTION or UNSAFE, a numerical confidence score between 1 and 100, a list of hazards with each hazard having level such as low, medium or high, location such as one of four quadrants of the scene and reason for the hazard. You may also include recommendations in the json. Do not ask the user for more information.

Queries:

a. Is it safe to land in the park between the buildings?

b. Is it safe to land on the roof top of the largest building in the image?

c. Is it safe to land on the street next to river?

d. Is it safe to land in the parking lot?

e. Is it safe to land in the park by the side of the river?

f. Is it safe to land on the street intersection?

9. Captions: You are an aerial drone image analyst. Describe the scene provided and elaborate on the objects detected and their spatial relationships. If there are multiple images of the same scene, describe the temporal changes to the scene.

Queries:

a. Image showing a building with a circular roof structure and a split in the road

b. Image showing an empty road beside a river

c. Image showing a large parking lot between an enclave of buildings

d. Images following the traffic around a bend of the city streets

e. Images from a flyover a park with trees between buildings

f. Images showing buildings of different elevations in a small block.

10. Pointing: You are an aerial drone image analyst. Given the scene and some objects, point to areas of interest with a list of bounding box co-ordinates pertaining to the query using the bottom-left of the image as reference and in the format (x,y,w,h). Do not ask the user for more information.

Queries:

a. Locate all the roofs of buildings that are occupied by stationary structures.

b. Locate all the multi-storey buildings that are greater than three storeys high.

c. Locate a parking garage and the available spaces on it.

d. Locate the empty spots for cars to park between buildings but not on streets.

e. Locate safe play areas for children where there is little or no traffic.

f. Locate the highest point in the scene that is safe to land.

11: Uncommon: You are an aerial drone image analyst. Given the scene and some objects, identify the object accurately at the location given in the query using the bottom-left of the image as reference and in the format (x,y) even if the object is uncommon. Do not ask the user for more information.

Queries:

a. Identify the object at location (132,235)

b. Identify the object at location (0,15)

c. Identify the object at location (450,80)

d. Identify the object at location (750,1025)

e. Identify the object at location (20,545)

f. Identify the object at location (225,235)

12. Spatial: (Metrics captured: # tokens used, AI quality - scale of 1 to 5)

Prompt: You are an aerial drone image analyst. Given a scene as an image and a multiple-choice question about spatial relationships, select the best answer. Do not include extra text or reasoning or ask the user for more information.

Queries:

a. What direction is the parking garage in from the buildings on the right side of the road assuming the top of the image is North? A. North, B. East. C. South, D. West

b. Where is the park from the street? A. Left B. Right C. Above D. Below

c. Where is the multi-storey building in the image? A. top-Left B. top-Right C. bottom-left D. bottom-right

d. Which direction would rainfall flow towards when falling on the dome of the building splitting the street? A. Left B. Right C. Above D. Below

e. Where is the sun given the shadows of the trees in the park? A. Left B. Right C. Above D. Below

f. Where is the object whose shadow is seen in the top-left of the scene? A. above or below the scene B. inside the scene C. Left or right of the scene.


Thursday, December 18, 2025

#Codingexercise

Absolute Difference Between Maximum and Minimum K Elements

You are given an integer array nums and an integer k.

Find the absolute difference between:

the sum of the k largest elements in the array; and

the sum of the k smallest elements in the array.

Return an integer denoting this difference.

Example 1:

Input: nums = [5,2,2,4], k = 2

Output: 5

Explanation:

The k = 2 largest elements are 4 and 5. Their sum is 4 + 5 = 9.

The k = 2 smallest elements are 2 and 2. Their sum is 2 + 2 = 4.

The absolute difference is abs(9 - 4) = 5.

Example 2:

Input: nums = [100], k = 1

Output: 0

Explanation:

The largest element is 100.

The smallest element is 100.

The absolute difference is abs(100 - 100) = 0.

Constraints:

1 <= n == nums.length <= 100

1 <= nums[i] <= 100

1 <= k <= n

import java.util.ArrayList;

import java.util.Arrays;

import java.util.Collections;

import java.util.List;

class Solution {

    public int absDifference(int[] nums, int k) {

        int[] sortedNums = IntStream.of(nums)

                                   .boxed()

                                   .sorted(Comparator.reverseOrder())

                                   .mapToInt(Integer::intValue)

                                   .toArray();

        long max = 0;

        long min = 0;

        for (int i = 0; i < k; i++) {

            max += (long) sortedNums[i];

        }

        for (int i = nums.length - 1; i >= nums.length - k; i--) {

            min += (long) sortedNums[i];

        }

        return (int) Math.abs(max - min);

    }

}

994 / 994 testcases passed


Wednesday, December 17, 2025

 AirSentinel.ai has emerged as a specialized player in the increasingly critical domain of drone detection, building a cloud‑based architecture designed to safeguard airspace from unauthorized or potentially dangerous unmanned aerial vehicles. Their system is conceived as a distributed network of sensors and analytics pipelines that feed into a centralized cloud platform. The goal is to provide continuous monitoring of skies over sensitive facilities, urban centers, and critical infrastructure, ensuring that drones are identified, classified, and tracked in real time. In an era when drones are proliferating rapidly, both for legitimate commercial use and for illicit activities, AirSentinel.ai’s architecture represents a proactive approach to airspace security. 

The way their system works is by deploying detection nodes that capture signals and signatures associated with drone activity. These nodes can include radio frequency sensors, acoustic detectors, and optical systems, all of which contribute data streams to the cloud. Once ingested, the data is processed through machine learning models that distinguish drones from other airborne objects, filter out noise, and assess potential threats. The cloud architecture is designed to scale horizontally, meaning that as more sensors are added across a city or region, the system can aggregate and analyze vast amounts of data without bottlenecks. This scalability is crucial because drone detection is not a localized problem; it requires a networked solution that can cover wide areas and adapt to evolving flight patterns. 

AirSentinel.ai emphasizes the importance of integration with existing security and operational systems. Their cloud platform is not just a detection tool but a hub that can trigger alerts, feed information into command centers, and coordinate responses. For example, if a drone is detected near an airport, the system can immediately notify air traffic control and law enforcement, providing details about the drone’s trajectory, speed, and likely operator location. This integration makes the architecture more than a passive monitoring system; it becomes an active participant in airspace management and security. 

The strength of AirSentinel.ai lies in its ability to unify disparate detection technologies into a coherent cloud‑based framework. However, its focus is primarily on identifying drones as objects in the sky and classifying their presence. This leaves an opportunity for enhancement when it comes to interpreting the broader context of drone activity, particularly at the ground level. This is where our drone video sensing analytics software could provide a powerful complement. While AirSentinel.ai ensures that drones are detected and tracked, our system can analyze the video feeds captured by drones themselves, adding semantic understanding of what those drones are doing and what environments they are traversing. 

By integrating our analytics pipeline, AirSentinel.ai could move beyond detection into contextual intelligence. For instance, if a drone is identified near a power plant, our software could process its video feed to determine whether it is simply passing overhead or actively surveying sensitive equipment. In urban environments, our system could classify whether a drone is monitoring traffic, filming a public event, or engaging in suspicious reconnaissance. The ability to fuse aerial video interpretation with detection data would give AirSentinel.ai’s platform a richer, more actionable layer of insight. It would transform alerts from simple notifications of presence into detailed assessments of intent and impact. 

The synergy between AirSentinel.ai’s detection cloud and our video sensing analytics would create a comprehensive airspace security solution. AirSentinel.ai provides the infrastructure to know when and where drones are flying, while our system explains what those drones are seeing and potentially why they are there. Together, they would enable authorities not only to respond to drone incursions but to understand them in context, making interventions more precise and effective. In this way, AirSentinel.ai’s architecture could evolve from a detection network into a full intelligence platform, safeguarding skies with both awareness and understanding.