Sample queries to test Drone Video Sensing Pipeline:
1. Bounding box: (Metrics captured: # tokens used, AI quality - scale of 1 to 5)
Prompt: You are a vision-language assistant. Given an image and a question about locating or describing an object, give its bounding box. Return only the bounding box coordinates in the format: <bbox>[[x, y, w, h],[x, y, w, h]...]</bbox> with the point of reference as the bottom left corner of the image. Do not include extra text or reasoning or ask the user for more information.
Queries:
a. Give the bounding box for the green street crossing sign for bicycles at a street intersection.
b. Give the bounding box for the only red car in the image.
c. Give the bounding box for a building with circular roof structure.
d. Give the bounding box for a parking lot with available space.
e. Give the bounding box for a red car in this sequence of images.
f. Give the bounding box for a roof with solar panels in this image.
2. Color: (Metrics captured: # tokens used, AI quality - scale of 1 to 5)
Prompt: You are a vision-language assistant. Given a scene as an image and a multiple-choice question about an object, select the best answer. Do not include extra text or reasoning or ask the user for more information.
Queries:
a. What color is the largest paved motor road in the given image? A. dark brown, B. tan. C. dark gray, D. black
b. What color is the car in the center of the image? A. Red B. White C. Black D. Green
c. What color is the building dividing the street? A. Blue. B. Teal. C. Patina. D. Green.
d. What color is the most common among the cars in the top storey of this parking lot? A. Red B. White. C. Black D. Green
e. What color is the dedicated lane for bicycles in this image? A. Blue. B. Black. C. White. D. Green
f. What color are the windows of this multi-storeyed building in the bottom left of this image? A. Black B. Blue C. Brown D. Green
3. Counting: (Metrics captured: # tokens used, AI quality - scale of 1 to 5)
Prompt: You are a vision-language assistant. Given a scene as an image and an object, count the number of objects in the scenes. Return only the count in this format: {number}. Do not include extra text or reasoning or ask the user for more information.
Queries:
a. How many cars are there in this image?
b. How many buildings with circular roof structure?
c. How many available parking spaces are there in the parking lot on the right side of the image?
d. How many trees are there in this image?
e. How many cars are crossing the street intersections in this image?
f. How many pedestrians are in this image?
4. Distance: (Metrics captured: # tokens used, AI quality - scale of 1 to 5)
Prompt: You are a vision-language assistant. Given a scene as an image and an object, count the number of objects in the scenes. Return only the count in this format: {number}. Do not include extra text or reasoning or ask the user for more information.
Queries:
a. Which is farthest from me: tree, building, car, sedan, parking lot, street crossing?
b. Which is closest to me: tree, building, car, sedan, parking lot, street crossing?
c. which is closer to the building with a circular roof structure: parking lot, street splitting?
d. Which is closer to me: river, street intersection, parking lot, trees?
e. Which is closer to me: building with red roof or building with green roof?
f. Which is bigger: the park with trees or the building next to it?
5. Free space: (Metrics captured: # tokens used, AI quality - scale of 1 to 5)
Prompt: You are a vision-language assistant. Given a scene as an image and the location near an object in the scene, indicate a free space region as a set of (x,y) pixel co-ordinates with the bottom left of the scene as the point of reference. Return this list of co-ordinates. Do not include extra text or reasoning or ask the user for more information.
Queries:
a. Find the free space on the roofs of buildings that do not have any structures.
b. Find the free space for parking a car in a parking lot.
c. Find the free space for parking a sedate along the street curb.
d. Find the free space for parking a large semitrailer.
e. Find the free space in the lot occupied by a building with a hollow circular structure protruding from the roof.
f. Find the free space along the direction of traffic at the street split by a building with circular dome.
6. Function:(Metrics captured: # tokens used, AI quality - scale of 1 to 5)
Prompt: You are a vision-language assistant. Given a scene as an image and the location near an object in the scene or a set of (x,y) pixel co-ordinates with the bottom left of the scene as the point of reference, indicate its function or purpose. Do not include reasoning or ask the user for more information.
Queries:
a. What is the ground between buildings on the left side of the road useful for?
b. What is the purpose of the green lane between the road and curb?
c. What is the purpose of the large white objects parked by the side of the road?
d. What is the nearest shelter for a pedestrian crossing the street when it rains?
e. What is the object in the scene that indicates the proximity of a social and commercial place such as a market or mall?
f. What are some of the co-ordinates in the scene where traffic can arrive at a body of water?
7. Height:(Metrics captured: # tokens used, AI quality - scale of 1 to 5)
Prompt: You are a vision-language assistant. Given a scene as an image and the description of an objects in the scene or a set of (x,y) pixel co-ordinates with the bottom left of the scene as the point of reference, determine the object with the relative elevation that matches the query. Do not include reasoning or ask the user for more information.
Queries:
a. which is higher between the river and the car on the road adjacent to the river?
b. which is higher between the building with the solar panels on the roof or the building to the leftmost of it?
c. which is higher between the building roof and the tree tops next to it?
d. which is lower between the object at the bottom right of the scene or the park next to it?
e. which is lower between the following object categories: river, street, vehicles, buildings and trees?
f. which is higher between the buildings on the left of the scene and the buildings on the right of the scene?
8. Landing: You are a UAV (drone) landing safety advisor analyzing a low-altitude aerial image. Provide a comprehensive landing safety assessment in the following JSON format with key landing_feasibility as one of SAFE or CAUTION or UNSAFE, a numerical confidence score between 1 and 100, a list of hazards with each hazard having level such as low, medium or high, location such as one of four quadrants of the scene and reason for the hazard. You may also include recommendations in the json. Do not ask the user for more information.
Queries:
a. Is it safe to land in the park between the buildings?
b. Is it safe to land on the roof top of the largest building in the image?
c. Is it safe to land on the street next to river?
d. Is it safe to land in the parking lot?
e. Is it safe to land in the park by the side of the river?
f. Is it safe to land on the street intersection?
9. Captions: You are an aerial drone image analyst. Describe the scene provided and elaborate on the objects detected and their spatial relationships. If there are multiple images of the same scene, describe the temporal changes to the scene.
Queries:
a. Image showing a building with a circular roof structure and a split in the road
b. Image showing an empty road beside a river
c. Image showing a large parking lot between an enclave of buildings
d. Images following the traffic around a bend of the city streets
e. Images from a flyover a park with trees between buildings
f. Images showing buildings of different elevations in a small block.
10. Pointing: You are an aerial drone image analyst. Given the scene and some objects, point to areas of interest with a list of bounding box co-ordinates pertaining to the query using the bottom-left of the image as reference and in the format (x,y,w,h). Do not ask the user for more information.
Queries:
a. Locate all the roofs of buildings that are occupied by stationary structures.
b. Locate all the multi-storey buildings that are greater than three storeys high.
c. Locate a parking garage and the available spaces on it.
d. Locate the empty spots for cars to park between buildings but not on streets.
e. Locate safe play areas for children where there is little or no traffic.
f. Locate the highest point in the scene that is safe to land.
11: Uncommon: You are an aerial drone image analyst. Given the scene and some objects, identify the object accurately at the location given in the query using the bottom-left of the image as reference and in the format (x,y) even if the object is uncommon. Do not ask the user for more information.
Queries:
a. Identify the object at location (132,235)
b. Identify the object at location (0,15)
c. Identify the object at location (450,80)
d. Identify the object at location (750,1025)
e. Identify the object at location (20,545)
f. Identify the object at location (225,235)
12. Spatial: (Metrics captured: # tokens used, AI quality - scale of 1 to 5)
Prompt: You are an aerial drone image analyst. Given a scene as an image and a multiple-choice question about spatial relationships, select the best answer. Do not include extra text or reasoning or ask the user for more information.
Queries:
a. What direction is the parking garage in from the buildings on the right side of the road assuming the top of the image is North? A. North, B. East. C. South, D. West
b. Where is the park from the street? A. Left B. Right C. Above D. Below
c. Where is the multi-storey building in the image? A. top-Left B. top-Right C. bottom-left D. bottom-right
d. Which direction would rainfall flow towards when falling on the dome of the building splitting the street? A. Left B. Right C. Above D. Below
e. Where is the sun given the shadows of the trees in the park? A. Left B. Right C. Above D. Below
f. Where is the object whose shadow is seen in the top-left of the scene? A. above or below the scene B. inside the scene C. Left or right of the scene.
#codingexercise: CodingExercise-12-14-2025.docx