Saturday, December 20, 2025

 Many of the drone vision analytics queries are about objects located in a scene. For example, a search for a “parking garage” in a scene should yield a result with a clipped image showing the garage.  

As a multimodal search, this does not always accurately result in the correct answer but a few techniques can help. This article list those. 

  1. When the scenes are vectorized frame by frame, they could also be analyzed to detect as many objects as possible along with their bounding boxes and saved with the scenes as documents with id, vector, captions, title, location, bounding box and tags. 

  1. The search over these accumulated scenes and objects can make use of various search options to narrow down the search. For example: 

  1. Create a vector from the text: 

search_text = "parking garage" 

vector_query = VectorizableTextQuery(text=search_text, exhaustive=True, k_nearest_neighbors=50, fields="vector", weight=0.5) 

results = dest_search_client.search( 

    search_text=search_text, 

    vector_queries=[vector_query], 

    query_type=QueryType.SEMANTIC, 

    select=["id", "description","vector"], 

    filter = f"description ne null and search.ismatch('{search_text}', 'description')", 

    semantic_configuration_name="mysemantic", 

    query_caption=QueryCaptionType.EXTRACTIVE, 

    query_answer=QueryAnswerType.EXTRACTIVE, 

    top=10, 

) 

  1. use semantic configuration 

  1. Semantic configuration leverages the text based content in the fields such as title, description and tags for keyword and semantic search. 

  1. Specify the Hierarchical Navigable Small World (HNSW) search or Exhaustive KNN search as appropriate. The differences are that HNSW has high accuracy and low latency but might miss neighbors while exhaustive counts all neighbours at higher cost. Usually with large datasets, HNSW performs better 

  1. filter the results: 

  1. You can always leverage the text associated with the images to narrow down your results. 

  1. Even if the match is not at the top of the list, retrieving ten results as tensors can still be used in a subsequent clustering to find the centroid. 

These are some of the tips to make the results of a multimodal search more deterministic and high quality on a scale of 1 to 5. 

No comments:

Post a Comment