Saturday, June 7, 2025

 The previous article` discussed ways to enhance the index in an azure ai search vector store by promoting text that can be used queries along with semantic configuration. The following for instance is an example of semantic search on the drone images by leveraging the extracted metadata as text fields.

# Runs a semantic query (runs a BM25-ranked query and promotes the most relevant matches to the top)

results = search_client.search(query_type='semantic', semantic_configuration_name='my-semantic-config',

    search_text="Are there images that show red cars as parked?",

    select='Id,Description,title,tags,bounding_box', query_caption='extractive')

for result in results:

    print(result["@search.reranker_score"])

    print(result["id"])

    print(f"Description: {result['Description']}")

    tags = result["@search.tags"]

    if "red car" in tags:

            print(f"Title: {result["title"]}\n")

And this doesn’t just stop at query responses. Instead of the embeddings model, now we can leverage gpt-4o chat LLMs to help us generate appropriate answers to the queries given that everything is text.

Similarly, queries are text and besides the semantic search as above, we can also decompose the queries to suit the ontology we derive from the metadata including labels and tags. The way we compose lower-level queries and reusable higher-level queries helps build intelligent drone sensing applications.


Friday, June 6, 2025

 Shredding images into objects for vector search:

The previous article discussed a technique to enhance the image retrieval for drone images following the vectorize and analyze method described in the references and comes helpful regardless of zero agent, one-agent or multiple agent-based retrieval. To enhance the retrieval based on the high probability of objects being mentioned in the query, it would be even better to query based on an index of objects along with their bm25 description and semantic similarity vectors. One tip here is that the same object might be detected in multiple aerial images and different objects might be available across images spread out over temporal dimension that are more meaningful to group. This can be achieved by the following steps:

Retrieve all objects(azure search documents) with their IDs and vectors

For each object, if not already grouped:

A. Perform a vector query with its vector, excluding itself

B. Collect objects with a score above a threshold

C. Use sliding window for finding the same object repeated over consecutive images, discard duplicates

D. use reranker to find temporally distributed different objects (ids are wide apart)

E. Add these objects to a group

F. Mark all objects in the group and duplicates as processed

The following code is an illustration to populate an index with all the objects (id,description,citation,boundingBox, and vector) from all the images to do the above steps.

import json

from azure.search.documents import SearchClient

from azure.core.credentials import AzureKeyCredential

import os

import re

search_endpoint = os.environ["AZURE_SEARCH_SERVICE_ENDPOINT"]

api_version = os.getenv("AZURE_SEARCH_API_VERSION")

search_api_key = os.getenv("AZURE_SEARCH_ADMIN_KEY")

index_name = os.getenv("AZURE_SEARCH_INDEX_NAME", "index00")

dest_index_name = os.getenv("AZURE_SEARCH_1024_INDEX_NAME", "index1024")

credential = AzureKeyCredential(search_api_key)

# Initialize SearchClient

search_client = SearchClient(

    endpoint=search_endpoint,

    index_name=index_name,

    credential=AzureKeyCredential(search_api_key)

)

destination_client = SearchClient(

    endpoint=search_endpoint,

    index_name=dest_index_name,

    credential=AzureKeyCredential(search_api_key)

)

def prepare_json_string_for_load(text):

  text = text.replace("\"", "'")

  text = text.replace("{'", "{\"")

  text = text.replace("'}", "\"}")

  text = text.replace(" '", " \"")

  text = text.replace("' ", "\" ")

  text = text.replace(":'", ":\"")

  text = text.replace("':", "\":")

  text = text.replace(",'", ",\"")

  text = text.replace("',", "\",")

  return re.sub(r'\n\s*', '', text)

def to_string(bounding_box):

    return f"{bounding_box['x']},{bounding_box['y']},{bounding_box['w']},{bounding_box['h']}"

page_size = 10

skip = 0

total = 17833

index = 0

while True:

    # Retrieve the first 10 entries from the index

    search_results = search_client.search("*", select=["id", "description", "vector"], top=page_size, skip = skip, include_total_count=True)

    # Process entries and shred descriptions

    flat_list = []

    if search_results.get_count() == 0:

        break

    for entry in search_results:

        entry_id = index

        index += 1

        width = 0

        height = 0

        tags = ""

        title = ""

        description_text = prepare_json_string_for_load(entry["description"]).replace('""','')

        description_json = json.loads(description_text)

        if description_json and description_json["description"]:

            title = description_json["description"]

        if description_json and description_json["_data"] and description_json["_data"]["tagsResult"] and description_json["_data"]["tagsResult"]["values"]:

            tags = ','.join([tag["name"] for tag in description_json["_data"]["tagsResult"]["values"]]).strip(",")

        # add entries at object level instead of image level

        if description_json and description_json["_data"] and description_json["_data"]["denseCaptionsResult"] and description_json["_data"]["denseCaptionsResult"]["values"]:

            for item in description_json["_data"]["denseCaptionsResult"]["values"]:

                text = item.get("text", "")

                bounding_box = item.get("boundingBox", {

                    "x": 0,

                    "y": 0,

                    "w": 0,

                    "h": 0

                })

                flat_list.append({

                    "id": index,

                    "image_id": entry_id,

                    "text": text,

                    "bounding_box": to_string(bounding_box),

                    "tags" : tags,

                    "title": title

                })

        else:

            print(f"Nothing found in entry with id:{id}")

        flat_list.append({

        "id": entry_id,

        "tags" : tags,

        "title": title

        })

     if len(flat_list) != 0:

                upload_results = destination_client.upload_documents(flat_list)

                error = ','.join([upload_result.error_message for upload_result in upload_results if upload_result.error_message]).strip(",")

                if error:

                    print(error)

                if len([upload_result.succeeded for upload_result in upload_results if upload_result.succeeded]) == page_size:

                   print(f"success in processing entries with id: {skip} to {skip + page_size}")

    skip += page_size


Vectorize and Analyze: https://1drv.ms/w/c/d609fb70e39b65c8/Eb6vxQeXGE9MsVwwdsvLSskBLgFNNuClDqAepem73pMcbQ?e=LtQasJ


Thursday, June 5, 2025

 Image retrieval enhancement:

The following is a technique to enhance the image retrieval for drone images following the vectorize and analyze method describe in the references and comes helpful regardless of zero agent, one-agent or multiple agent-based retrieval:

import json

from azure.search.documents import SearchClient

from azure.core.credentials import AzureKeyCredential

import os

import re

search_endpoint = os.environ["AZURE_SEARCH_SERVICE_ENDPOINT"]

api_version = os.getenv("AZURE_SEARCH_API_VERSION")

search_api_key = os.getenv("AZURE_SEARCH_ADMIN_KEY")

index_name = os.getenv("AZURE_SEARCH_INDEX_NAME", "index00")

dest_index_name = os.getenv("AZURE_SEARCH_1024_INDEX_NAME", "index1024")

credential = AzureKeyCredential(search_api_key)

# Initialize SearchClient

search_client = SearchClient(

    endpoint=search_endpoint,

    index_name=index_name,

    credential=AzureKeyCredential(search_api_key)

)

destination_client = SearchClient(

    endpoint=search_endpoint,

    index_name=dest_index_name,

    credential=AzureKeyCredential(search_api_key)

)

def prepare_json_string_for_load(text):

  text = text.replace("\"", "'")

  text = text.replace("{'", "{\"")

  text = text.replace("'}", "\"}")

  text = text.replace(" '", " \"")

  text = text.replace("' ", "\" ")

  text = text.replace(":'", ":\"")

  text = text.replace("':", "\":")

  text = text.replace(",'", ",\"")

  text = text.replace("',", "\",")

  return re.sub(r'\n\s*', '', text)

def to_string(bounding_box):

    return f"{bounding_box['x']},{bounding_box['y']},{bounding_box['w']},{bounding_box['h']}"

page_size = 10

skip = 0

total = 17833

while True:

    # Retrieve the first 10 entries from the index

    search_results = search_client.search("*", select=["id", "description", "vector"], top=page_size, skip = skip, include_total_count=True)

    # Process entries and shred descriptions

    flat_list = []

    if search_results.get_count() == 0:

        break

    for entry in search_results:

        entry_id = entry["id"]

        width = 0

        height = 0

        tags = ""

        title = ""

        description_text = prepare_json_string_for_load(entry["description"]).replace('""','')

        description_json = json.loads(description_text)

        if description_json and description_json["description"]:

            title = description_json["description"]

        if description_json and description_json["_data"] and description_json["_data"]["tagsResult"] and description_json["_data"]["tagsResult"]["values"]:

            tags = ','.join([tag["name"] for tag in description_json["_data"]["tagsResult"]["values"]]).strip(",")

        # add entries at object level instead of image level

        # if description_json and description_json["_data"] and description_json["_data"]["denseCaptionsResult"] and description_json["_data"]["denseCaptionsResult"]["values"]:

            # for item in description_json["_data"]["denseCaptionsResult"]["values"]:

                # text = item.get("text", "")

                # bounding_box = item.get("boundingBox", {

                    # "x": 0,

                    # "y": 0,

                    # "w": 0,

                    # "h": 0

                # })

                # flat_list.append({

                    # "id": entry_id,

                    # "text": text,

                    # "bounding_box": to_string(bounding_box),

                    # "tags" : tags,

                    # "title": title

                # })

        # else:

            # print(f"Nothing found in entry with id:{id}")

        flat_list.append({

        "id": entry_id,

        "tags" : tags,

        "title": title

        })

     if len(flat_list) != 0:

                merge_results = destination_client.merge_documents(flat_list)

                error = ','.join([merge_result.error_message for merge_result in merge_results if merge_result.error_message]).strip(",")

                if error:

                    print(error)

                if len([merge_result.succeeded for merge_result in merge_results if merge_result.succeeded]) == page_size:

                   print(f"success in merging entries with id: {skip} to {skip + page_size}")

    skip += page_size

References:

Vectorize and Analyze: https://1drv.ms/w/c/d609fb70e39b65c8/Eb6vxQeXGE9MsVwwdsvLSskBLgFNNuClDqAepem73pMcbQ?e=LtQasJ


Wednesday, June 4, 2025

 Many software businesses starting out today benefit from being cloud-native, API-first and composable. These are core qualities of modern tech stacks because developer friendliness is good for business and these aspects drive business. Let us take a closer look at these aspects of web architectures.

The rise of the cloud manifested in microservices development and while the initial step was about squeezing web applications into containers, most took the next step to embracing composable web architectures. This evolution emphasizes modularization, flexibility, and re-usability of components. The way people went about it was to take an axe to the front-end, backend and data sources to split them into components that could develop, scale and be tested independently, especially with a perspective for combining or replacing as needed. This reduced development time to releases significantly and tremendously helped with integration of new services. The flexibility of the technology stack freed us from the vendor lock-ins. It also put pressure on the vendors, specifically the established ones to offer more out of the box that could be weighed in trade-offs against “rolling your own”. One of the overlooked components of this split but held a significant promise in observability and telemetry was the runtime component with other components pertaining to frontend, backend or data. This component alone focused on finding the right environment for the technology choices made by the other teams to ensure the availability, stability and security of the system. Its main stakeholders are the operations teams.

As with any trends in process or practices, roles start to emerge to take care of different components. Choices that were made for the programming language, speed of development, ease of deployment or ease of use tends to become local and less interfering with others, something that monoliths lacked. Business choices that were impeded by technology could now avoid conditionals, branches and partials that proved frustrating and could now be made with an intent to reach the market with value propositions faster. A single stakeholder such as online marketing and their choice of data management system was now bound in scope, spread and timeline. Campaigns including creating, editing and publishing could be realized faster, tracked and closed with a set of disposable or re-usable artifacts. Processes and workflows including those pertaining to campaigns could now be leveraging the same versioned releases of data or content. In a headless mode, apis did the communication with no need for administration or presentation while allowing them to be added on independently and where appropriate. This way of organizing into headless and decentralized system provide the main point for composition. The application layer could then be distributed into single-purpose services which come with modularity, scalability, resilience, lightweightedness and programmability. These architectural properties allow you to maximize on Consistency, Availability and Partition tolerance which is bound by CAP theorem that says only two of these three guarantees can be picked for distributed data access.


Tuesday, June 3, 2025

 Using Metadata in RAG

The previous article discussed agentic retrieval of drone images for drone sensing applications and while that leverages text-embedding-ada-002 and gpt-4o-min LLMs, there is inherent value in text based keyword and semantic search on the metadata associated with the results of the analysis of the drone images as these can participate in the Azure AI search as structured data in addition to its vector store. This makes a user interface to query the drone world for drone sensing applications even more robust and with higher precision and recall. Promoting the metadata as text allows the use of standard query operators for drone sensing applications and comes as a low-cost option.

For example:

import json

from azure.search.documents import SearchClient

from azure.core.credentials import AzureKeyCredential

import os

import re

search_endpoint = os.environ["AZURE_SEARCH_SERVICE_ENDPOINT"]

api_version = os.getenv("AZURE_SEARCH_API_VERSION")

search_api_key = os.getenv("AZURE_SEARCH_ADMIN_KEY")

index_name = os.getenv("AZURE_SEARCH_INDEX_NAME", "index00")

credential = AzureKeyCredential(search_api_key)

# Initialize SearchClient

search_client = SearchClient(

    endpoint=search_endpoint,

    index_name=index_name,

    credential=AzureKeyCredential(search_api_key)

)

# Retrieve the first 10 entries from the index

search_results = search_client.search("*", select=["id", "description", "vector"], top=10)

# Process entries and shred descriptions

flat_list = []

def prepare_json_string_for_load(text):

  text = text.replace("\"", "'")

  text = text.replace("{'", "{\"")

  text = text.replace("'}", "\"}")

  text = text.replace(" '", " \"")

  text = text.replace("' ", "\" ")

  text = text.replace(":'", ":\"")

  text = text.replace("':", "\":")

  text = text.replace(",'", ",\"")

  text = text.replace("',", "\",")

  return re.sub(r'\n\s*', '', text)

def to_string(bounding_box):

    return f"{bounding_box['x']},{bounding_box['y']},{bounding_box['w']},{bounding_box['h']}"

for entry in search_results:

    entry_id = entry["id"]

    width = 0

    height = 0

    tags = ""

    title = ""

    description_text = prepare_json_string_for_load(entry["description"]).replace('""','')

    description_json = json.loads(description_text)

    if description_json and description_json["description"]:

        title = description_json["description"]

    if description_json and description_json["_data"] and description_json["_data"]["tagsResult"] and description_json["_data"]["tagsResult"]["values"]:

        tags = ','.join([tag["name"] for tag in description_json["_data"]["tagsResult"]["values"]]).strip(",")

    if description_json and description_json["_data"] and description_json["_data"]["denseCaptionsResult"] and description_json["_data"]["denseCaptionsResult"]["values"]:

        for item in description_json["_data"]["denseCaptionsResult"]["values"]:

            text = item.get("text", "")

            bounding_box = item.get("boundingBox", {

                "x": 0,

                "y": 0,

                "w": 0,

                "h": 0

            })

            flat_list.append({

                "id": entry_id,

                "text": text,

                "bounding_box": to_string(bounding_box),

                "tags" : tags,

                "title": title

            })

    else:

        print(f"Nothing found in entry with id:{id}")

# Print the flattened list

print(len(flat_list))

Result: 100


Monday, June 2, 2025

 These are some performance improvement considerations for drone sensing applications when querying aerial flyover images from drones as studied from the explanations of the case study in previous posts. The following table outlines some of the comparisons made in terms of precision and recall.

For example, with the query “red car”, a query response limit of 50 images, vector dimensions of 1536 to enable embedding models and a baseline precision and recall from multimodal search the table outlines the relative improvements for various features:





From the comparisons, it seems the ranking of the images plays a significant role in precision, but the variety of images recalled significantly improves with query rewrites.

The query is vectorized and the descriptions of the objects detected in the images are part of the semantic configuration, the recall is healthy to suit many drone sensing applications providing a chat like interface to retrieve images only from the drone world proves sufficient. But the real gain in improvement happens with agentic retrieval when the responses to the queries from the drone sensing applications are merged and re-ranked. Many of the images retrieved across various approaches had red cars in them and some displayed the images with the greatest number of red cars from aerial shots as the first few results even when the size of the object in the image was less than 5% of the overall aerial image size in terms of pixels.

Caching of responses so that the store does not get hit for query re-use certainly improves performance as well as cost. Re-indexing operations are not counted in the comparisons above because they were completed prior to the comparisons. Re-indexing can be avoided if we setup the vectorizer with the openai embedding models on the algorithms used with the vector search and the dimensions of the vectors during upsert agrees with that needed by the embeddings model.

Token usage increases linearly with agentic retrieval as each agent leverages an LLM for its task and toolset. Token usage can be limited to keep the costs low and by reducing the response size.

This case study clearly shows the suitability of agentic retrieval for drone sensing applications.


Sunday, June 1, 2025

 Another form of the example cited in previous article for automatic query decomposition, parallel execution and reranking of merged results is done with this form of agentic retrieval:

#!/usr/bin/python

from dotenv import load_dotenv

from azure.identity import DefaultAzureCredential, get_bearer_token_provider

import os


load_dotenv(override=True)


project_endpoint = os.environ["PROJECT_ENDPOINT"]

agent_model = os.getenv("AGENT_MODEL", "gpt-4o-mini")

search_endpoint = os.environ["AZURE_SEARCH_SERVICE_ENDPOINT"]

api_version = os.getenv("AZURE_SEARCH_API_VERSION")

search_api_key = os.getenv("AZURE_SEARCH_ADMIN_KEY")

credential = DefaultAzureCredential()

token_provider = get_bearer_token_provider(credential, "https://search.azure.com/.default")

index_name = os.getenv("AZURE_SEARCH_NEW_INDEX_NAME", "index01")

azure_openai_endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]

azure_openai_api_key = os.getenv("AZURE_OPENAI_API_KEY")

azure_openai_gpt_deployment = os.getenv("AZURE_OPENAI_GPT_DEPLOYMENT", "gpt-4o-mini")

azure_openai_gpt_model = os.getenv("AZURE_OPENAI_GPT_MODEL", "gpt-4o-mini")

azure_openai_embedding_deployment = os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYMENT", "text-embedding-3-large")

azure_openai_embedding_model = os.getenv("AZURE_OPENAI_EMBEDDING_MODEL", "text-embedding-3-large")

agent_name = os.getenv("AZURE_SEARCH_AGENT_NAME", "agentic-retrieval-drone-images")


from azure.search.documents.indexes.models import KnowledgeAgent, KnowledgeAgentAzureOpenAIModel, KnowledgeAgentTargetIndex, KnowledgeAgentRequestLimits, AzureOpenAIVectorizerParameters

from azure.search.documents.indexes import SearchIndexClient


agent = KnowledgeAgent(

    name=agent_name,

    models=[

        KnowledgeAgentAzureOpenAIModel(

            azure_open_ai_parameters=AzureOpenAIVectorizerParameters(

                resource_url=azure_openai_endpoint,

                deployment_name=azure_openai_gpt_deployment,

                model_name=azure_openai_gpt_model

            )

        )

    ],

    target_indexes=[

        KnowledgeAgentTargetIndex(

            index_name=index_name,

            default_reranker_threshold=2.5

        )

    ],

    request_limits=KnowledgeAgentRequestLimits(

        max_output_size=10000

    )

)


index_client = SearchIndexClient(endpoint=search_endpoint, credential=credential)

index_client.create_or_update_agent(agent)

print(f"Knowledge agent '{agent_name}' created or updated successfully")


from azure.ai.projects import AIProjectClient


project_client = AIProjectClient(endpoint=project_endpoint, credential=credential)


list(project_client.agents.list_agents())

instructions = """

A Q&A agent that can answer questions about the drone images stored in Azure AI Search.

Sources have a JSON description and vector format with a ref_id that must be cited in the answer.

If you do not have the answer, respond with "I don't know".

"""

agent = project_client.agents.create_agent(

    model=agent_model,

    name=agent_name,

    instructions=instructions

)


print(f"AI agent '{agent_name}' created or updated successfully")

from azure.ai.agents.models import FunctionTool, ToolSet, ListSortOrder


from azure.search.documents.agent import KnowledgeAgentRetrievalClient

from azure.search.documents.agent.models import KnowledgeAgentRetrievalRequest, KnowledgeAgentMessage, KnowledgeAgentMessageTextContent, KnowledgeAgentIndexParams


agent_client = KnowledgeAgentRetrievalClient(endpoint=search_endpoint, agent_name=agent_name, credential=credential)


thread = project_client.agents.threads.create()

retrieval_results = {}


def agentic_retrieval() -> str:

    """

        Searches drone images about objects detected and their facts.

        The returned string is in a JSON format that contains the reference id.

        Be sure to use the same format in your agent's response

        You must refer to references by id number

    """

    # Take the last 5 messages in the conversation

    messages = project_client.agents.messages.list(thread.id, limit=5, order=ListSortOrder.DESCENDING)

    # Reverse the order so the most recent message is last

    messages = list(messages)

    messages.reverse()

    retrieval_result = agent_client.retrieve(

        retrieval_request=KnowledgeAgentRetrievalRequest(

            messages=[KnowledgeAgentMessage(role=msg["role"], content=[KnowledgeAgentMessageTextContent(text=msg.content[0].text)]) for msg in messages if msg["role"] != "system"],

            target_index_params=[KnowledgeAgentIndexParams(index_name=index_name, reranker_threshold=2.5)]

        )

    )


    # Associate the retrieval results with the last message in the conversation

    last_message = messages[-1]

    retrieval_results[last_message.id] = retrieval_result


    # Return the grounding response to the agent

    return retrieval_result.response[0].content[0].text


# https://learn.microsoft.com/en-us/azure/ai-services/agents/how-to/tools/function-calling

functions = FunctionTool({ agentic_retrieval })

toolset = ToolSet()

toolset.add(functions)

project_client.agents.enable_auto_function_calls(toolset)


from azure.ai.agents.models import AgentsNamedToolChoice, AgentsNamedToolChoiceType, FunctionName


message = project_client.agents.messages.create(

    thread_id=thread.id,

    role="user",

    content="""

        How many parking lots are empty when compared to all the parking lots?

        How many red cars could be found as parked?

    """

)


run = project_client.agents.runs.create_and_process(

    thread_id=thread.id,

    agent_id=agent.id,

    tool_choice=AgentsNamedToolChoice(type=AgentsNamedToolChoiceType.FUNCTION, function=FunctionName(name="agentic_retrieval")),

    toolset=toolset)

if run.status == "failed":

    raise RuntimeError(f"Run failed: {run.last_error}")

output = project_client.agents.messages.get_last_message_text_by_role(thread_id=thread.id, role="assistant").text.value


print("Agent response:", output.replace(".", "\n"))


import json


retrieval_result = retrieval_results.get(message.id)

if retrieval_result is None:

    raise RuntimeError(f"No retrieval results found for message {message.id}")


print("Retrieval activity")

print(json.dumps([activity.as_dict() for activity in retrieval_result.activity], indent=2))

print("Retrieval results")

print(json.dumps([reference.as_dict() for reference in retrieval_result.references], indent=2))