Thursday, March 27, 2025

 RAG with vector search involves retrieving information using a vector database, augmenting the user’s prompt with that information and generating a response based on the user’s prompt and information retrieved using an LLM. Each of these steps can be implemented by a variety of approaches but we will go over the mainstream.

1. Data Preparation: This is all about data ingestion into a vector database usually with the help of connectors. This isn’t a one-time task because a vector database should be regularly updated and provide high-quality information for an embeddings model otherwise, it might sound outdated. The great part of the RAG process is that the LLM weights do not need to be adjusted as data is ingested. Some of the common stages in this step include parsing the input documents, splitting the documents into chunks which incidentally can affect the output quality, and using an embedding model to convert each of the chunks into a high-dimensional numerical vector, storing and indexing the embeddings which results in a vector index to boost search efficiency and recording metadata that can participate in filtering. The value of embeddings for RAG is in the articulation of similarity scores between the meanings of the set of original text.

2. Retrieval is all about getting the relevant context. After preprocessing the original documents, we have a vector database storing chunks, embeddings and metadata. In this step, the user provides a prompt which the application uses to query the vector database and with the relevant results, augments the original prompt in the next step. Querying the vector database is done with the help of similarity scores between the vector representing the query to those in the database. There are many ways to improve search results and these include hybrid search, reranking, summarized text comparison, contextual chunk retrieval, prompt refinement, and domain specific tuning.

3. Augmenting the prompt with the retrieved context equips the model with both the prompt and the context needed to address the prompt The structure of the new prompt that combines the retrieved texts and the users’ prompt can impact the quality of the result.

4. Generation of the response is done with the help of an LLM and follows after the retrieval and augmentation steps. Some LLMs are quite good at following instructions but many require post processing. Another important consideration is whether RAG system should have memory of previous prompts and responses. One way to enhance generation is to add multi-turn conversation ability which allows us to ask a follow-up question.

Last but not the least, RAG performance must be measured. Sometimes, other LLMs are used to judge the response quality with simple scores like a range of 1-5. Prompt engineering plays a significant role in such cases to guide a model’s results towards a desired result. Fine-tuning can enhance the model’s expressiveness and accuracy.


Wednesday, March 26, 2025

 RAG with vector search involves retrieving information using a vector database, augmenting the user’s prompt with that information and generating a response based on the user’s prompt and information retrieved using an LLM. Each of these steps can be implemented by a variety of approaches but we will go over the mainstream.

1. Data Preparation: This is all about data ingestion into a vector database usually with the help of connectors. This isn’t a one-time task because a vector database should be regularly updated and provide high-quality information for an embeddings model otherwise, it might sound outdated. The great part of the RAG process is that the LLM weights do not need to be adjusted as data is ingested. Some of the common stages in this step include parsing the input documents, splitting the documents into chunks which incidentally can affect the output quality, and using an embedding model to convert each of the chunks into a high-dimensional numerical vector, storing and indexing the embeddings which results in a vector index to boost search efficiency and recording metadata that can participate in filtering. The value of embeddings for RAG is in the articulation of similarity scores between the meanings of the set of original text.

2. Retrieval is all about getting the relevant context. After preprocessing the original documents, we have a vector database storing chunks, embeddings and metadata. In this step, the user provides a prompt which the application uses to query the vector database and with the relevant results, augments the original prompt in the next step. Querying the vector database is done with the help of similarity scores between the vector representing the query to those in the database. There are many ways to improve search results and these include hybrid search, reranking, summarized text comparison, contextual chunk retrieval, prompt refinement, and domain specific tuning.

3. Augmenting the prompt with the retrieved context equips the model with both the prompt and the context needed to address the prompt The structure of the new prompt that combines the retrieved texts and the users’ prompt can impact the quality of the result.

4. Generation of the response is done with the help of an LLM and follows after the retrieval and augmentation steps. Some LLMs are quite good at following instructions but many require post processing. Another important consideration is whether RAG system should have memory of previous prompts and responses. One way to enhance generation is to add multi-turn conversation ability which allows us to ask a follow-up question.

Last but not the least, RAG performance must be measured. Sometimes, other LLMs are used to judge the response quality with simple scores like a range of 1-5. Prompt engineering plays a significant role in such cases to guide a model’s results towards a desired result. Fine-tuning can enhance the model’s expressiveness and accuracy.


Tuesday, March 25, 2025

 The vectors generated by embedding models are often stored in a specialized vector database. Vector databases are optimized for storing and retrieving vector data efficiently. Like traditional databases, vector databases can be used to manage permissions, metadata and data integrity, ensuring secure and organized access to information. They also tend to include update mechanisms so newly added texts are indexed and ready to use quickly.

The difference that a vector database and Retrieval Augmented Generation makes might be easier to explain with an example. When a chatbot powered by LLama2 LLM is asked about an acronym that was not part of its training text, it tends to guess and respond with an incorrect expansion and elaborating on what that might be. It does not even hint that it might be making things up. This is often referred to as hallucination. But if an RAG system is setup with access to documentation that explains what the acronym stands for, the relevant information is indexed and becomes part of the vector database, and the same prompt will now give a more pertinent information. With RAG, the LLM provides correct answers.

If the prompt was provided with the relevant documents that contain an answer, which is referred to as augmenting the prompt, the LLM can leverage that against the vector database and provide more compelling and coherent answers that would turn out to be knowledgeable as well as opposed to the hallucination referred above. By automating this process, we can make the chat responses to be more satisfactory every time. This might require additional steps of building a retrieval system backed by a vector database. It might also involve extra steps of data processing and managing the generated vectors. RAG also has added benefits for the LLM to consolidate multiple sources of data into a readable output tailored to the user's prompt. RAG applications can also incorporate proprietary data which makes it different from the public data that most LLM are trained on. The data can be up to date so that the LLM is not restricted to the point-in-time that it was trained on. RAG reduces hallucinations and allows the LLM to provide citations and query statistics to make the processing more transparent to the users. As with all retrieval systems, fine-grained data access control also brings about its own advantages.

There are four steps for building Retrieval-Augmented Generation (RAG):

1. Data Augmentation

a. Objective: Prepare data for a real-time knowledge base and contextualization in LLM queries by populating a vector database.

b. Process: Integrate disparate data using connectors, transform and refine raw data streams, and create vector embeddings from unstructured data. This step ensures that the latest version of proprietary data is instantly accessible for GenAI applications.

2. Inference

a. Objective: Connect relevant information with each prompt, contextualizing user queries and ensuring GenAI applications handle responses accurately.

b. Process: Continuously update the vector store with fresh sensor data. When a user prompt comes in, enrich and contextualize it in real-time with private data and data retrieved from the vector store. Stream this information to an LLM service and pass the generated response back to the web application.

3. Workflows

a. Objective: Parse natural language, synthesize necessary information, and use reasoning agents to determine the next steps to optimize performance.

b. Process: Break down complex queries into simpler steps using reasoning agents, which interact with external tools and resources. This involves multiple calls to different systems and APIs, processed by the LLM to give a coherent response. Stream Governance ensures data quality and compliance throughout the workflow.

4. Post-Processing

a. Objective: Validate LLM outputs and enforce business logic and compliance requirements to detect hallucinations and ensure trustworthy answers.

b. Process: Use frameworks like BPML or Morphir to perform sanity checks and other safeguards on data and queries associated with domain data. Decouple post-processing from the main application to allow different teams to develop independently. Apply complex business rules in real-time to ensure accuracy and compliance and use querying for deeper logic checks.

These steps collectively ensure that RAG systems provide accurate, relevant, and trustworthy responses by leveraging real-time data and domain-specific context.

Reference:

An earlier article: https://1drv.ms/w/c/d609fb70e39b65c8/EVqYhXoM2U5GpfzPsvndd1ABWXzGeXD1cixxJ9wRsWRh3g?e=aVoTd1


Monday, March 24, 2025

 RAG:

The role of a RAG is to create a process of combining a user’s prompt with relevant external information to form a new expanded prompt for a large language model aka LLM. The expanded prompt enables the LLM to provide more relevant, timely and accurate responses that the direct querying based on just data embeddings on domain-specific data. Its importance lies in providing real-time, contextualized, and trustworthy data for UAV swarm applications. If LLMs could be considered as encapsulating business logic in AI applications, then RAG and knowledge bases can be considered as data platform services.

LLMs are machine learning algorithms that can interpret, manipulate, and generate text-based content. They are trained on massive text datasets from diverse sources, including books, internet scraped text, and code repositories. During the training process, the model learns statistical relationships between words and phrases, enabling it to generate new text using the context of text it has already seen or generated. LLMs are typically used via "prompting," which is text that a user provides to an LLM and that the LLM responds to. Prompts can take various forms, such as incomplete statements or questions or instructions and this is even more scarce when dealing with multimodal data from UAV swarm sensors. RAG applications that enable users to ask questions about text generally use instruction-following and question-answering LLMs. In RAG, the user's question or instruction is combined with some information retrieved from an external data source, forming the new, augmented prompt. This helps to overcome issues like hallucinations, maintaining up-to-date information, and overcoming domain-specific knowledge.

The four steps for building Retrieval-Augmented Generation (RAG) are:

1. Data Augmentation

a. Objective: Prepare data for a real-time knowledge base and contextualization in LLM queries by populating a vector database.

b. Process: Integrate disparate data using connectors, transform and refine raw data streams, and create vector embeddings from unstructured data. This step ensures that the latest version of UAV swarm proprietary data is instantly accessible for GenAI applications.

2. Inference

a. Objective: Connect relevant information with each prompt, contextualizing user queries and ensuring GenAI applications handle responses accurately.

b. Process: Continuously update the vector store with fresh sensor data. When a user prompt comes in, enrich and contextualize it in real-time with private data and data retrieved from the vector store. Stream this information to an LLM service and pass the generated response back to the web application.

3. Workflows

a. Objective: Parse natural language, synthesize necessary information, and use reasoning agents to determine the next steps to optimize performance.

b. Process: Break down complex queries into simpler steps using reasoning agents, which interact with external tools and resources. This involves multiple calls to different systems and APIs, processed by the LLM to give a coherent response. Stream Governance ensures data quality and compliance throughout the workflow.

4. Post-Processing

a. Objective: Validate LLM outputs and enforce business logic and compliance requirements to detect hallucinations and ensure trustworthy answers.

b. Process: Use frameworks like BPML or Morphir to perform sanity checks and other safeguards on data and queries associated UAV swarms. Decouple post-processing from the main application to allow different teams to develop independently. Apply complex business rules in real-time to ensure accuracy and compliance and use querying for deeper logic checks.

These steps collectively ensure that RAG systems provide accurate, relevant, and trustworthy responses by leveraging real-time data and domain-specific context.


Sunday, March 23, 2025

 The four steps for building Retrieval-Augmented Generation (RAG) are:

1. Data Augmentation

a. Objective: Prepare data for a real-time knowledge base and contextualization in LLM queries by populating a vector database.

b. Process: Integrate disparate data using connectors, transform and refine raw data streams, and create vector embeddings from unstructured data. This step ensures that the latest version of UAV swarm proprietary data is instantly accessible for GenAI applications.

2. Inference

a. Objective: Connect relevant information with each prompt, contextualizing user queries and ensuring GenAI applications handle responses accurately.

b. Process: Continuously update the vector store with fresh sensor data. When a user prompt comes in, enrich and contextualize it in real-time with private data and data retrieved from the vector store. Stream this information to an LLM service and pass the generated response back to the web application.

3. Workflows

a. Objective: Parse natural language, synthesize necessary information, and use reasoning agents to determine the next steps to optimize performance.

b. Process: Break down complex queries into simpler steps using reasoning agents, which interact with external tools and resources. This involves multiple calls to different systems and APIs, processed by the LLM to give a coherent response. Stream Governance ensures data quality and compliance throughout the workflow.

4. Post-Processing

a. Objective: Validate LLM outputs and enforce business logic and compliance requirements to detect hallucinations and ensure trustworthy answers.

b. Process: Use frameworks like BPML or Morphir to perform sanity checks and other safeguards on data and queries associated UAV swarms. Decouple post-processing from the main application to allow different teams to develop independently. Apply complex business rules in real-time to ensure accuracy and compliance and use querying for deeper logic checks.

These steps collectively ensure that RAG systems provide accurate, relevant, and trustworthy responses by leveraging real-time data and domain-specific context.


Saturday, March 22, 2025

 This is a review of Retrieval Augmented Generation aka RAG but in a more general sense than the specific application to UAV swarm discussed earlier.  This is a process of combining a user’s prompt with relevant external information to form a new expanded prompt for a large language model aka LLM. The expanded prompt enables the LLM to provide more relevant, timely and accurate responses. 

LLMs are machine learning algorithms that can interpret, manipulate, and generate text-based content. They are trained on massive text datasets from diverse sources, including books, internet scraped text, and code repositories. During the training process, the model learns statistical relationships between words and phrases, enabling it to generate new text using the context of text it has already seen or generated. LLMs are typically used via "prompting," which is text that a user provides to an LLM and that the LLM responds to. Prompts can take various forms, such as incomplete statements or questions or instructions. RAG applications that enable users to ask questions about text generally use instruction-following and question-answering LLMs. In RAG, the user's question or instruction is combined with some information retrieved from an external data source, forming the new, augmented prompt. 

An effective RAG application uses Vector Search to identify relevant text for a user's prompt. An embedding model translates each text into a numeric vector, encapsulating its meaning. This process converts the user's query to a comparable vector, allowing for mathematical comparison and identifying the most similar and relevant texts. These vectors represent the meanings of the text from which they are generated, enabling retrieval of the text most relevant to the user's query. However, embedding models may not capture the exact meaning desired, so it's essential to test and evaluate every component of a RAG application. Vector databases are optimized for storing and retrieving vector data efficiently, managing permissions, metadata, and data integrity. LLMs, however, are not reliable as knowledge sources and may respond with made-up answers or hallucinations. To mitigate these issues, explicit information can be provided to the LLM, such as copying and pasting reference documents to ChatGPT or another LLM. Implementing RAG with Vector Search can address the limitations of LLM-only approaches by providing additional context for the LLM to use when formulating an answer. 

RAG applications enable the integration of proprietary data, up-to-date information, and enhanced accuracy of LLM responses. They provide access to internal documents and communications, reducing the occurrence of hallucinations and allowing for human verification. RAG also enables fine-grained data access control, allowing LLMs to securely reference confidential or personal data based on user access credentials. It equips LLMs with context-specific information, enabling applications that LLMs alone may not generate reliably. RAG is particularly useful in question-answering systems, customer service, content generation, and code assistance. For instance, a large e-commerce company uses Databricks for an internal RAG application, allowing HR to query hundreds of employee policy documents. RAG systems can also streamline the customer service process by providing personalized responses to customer queries, enhancing customer experience and reducing response times. Additionally, RAG can enhance code completion and Q&A systems by intelligently searching and retrieving information from code bases, documentation, and external libraries. 

RAG with Vector Search is a process that involves retrieving information from an external source, augmenting the user's prompt with that information, and generating a response based on the user's prompt and information retrieved using an LLM. Data preparation is a continuous process, involving parsing raw input documents into text format, splitting documents into chunks, and embedding the text chunks. The choice of chunk size depends on the source documents, LLM, and the RAG application's goals. 

Embeddings are a type of language model that generates numeric vectors or series of numbers from a text, encoding the nuanced and context-specific meaning of each text. They can be mathematically compared to each other, allowing for better understanding of the meanings of the original texts. 

Embeddings are stored in a specialized vector database, which efficiently stores and searches for vector data like embeddings. Vector databases often incorporate update mechanisms to allow for easy searching of newly added chunks. Overall, RAG with Vector Search is a valuable tool for generating effective and relevant responses. 

Friday, March 21, 2025

 Emerging trends:

Constructing an incremental “knowledge base” of a landscape from drone imagery merges ideas from simultaneous localization and mapping (SLAM), structure-from-motion (SfM), and semantic segmentation. Incremental SLAM and 3D reconstruction is suggested in the ORB-SLAM2 paper by Mur-Atal and Tardos in 2017 where a 3D Map is built by estimating camera poses and reconstructing scene geometry from monocular, stereo, or RGB-D inputs. Such SLAM framework can also be extended by fusing in semantic cues to enrich the resulting map with object and scene labels The idea of including semantic information for 3D reconstruction is demonstrated by SemanticFusion written by McCormick et al for ICCV 2017 where they use a Convolutional Neural Network aka CNN for semantic segmentation as their system fuses semantic labels into a surfel-based 3D map, thereby transforming a purely geometric reconstruction into a semantically rich representation of a scene. SemanticFusion helps to label parts of the scene – turning a raw point cloud or mesh into a knowledge base where objects, surfaces and even relationships can be recognized and queries. SfM, on the other hand, helps to stitch multi-view data into a consistent 3D-model where the techniques are particularly relevant for drone applications. Incremental SfM pipelines can populate information about a 3D space based on the data that arrives in the pipeline, but the drones can “walk the grid” around an area of interest to make sure sufficient data is captured to buid the 3D-model from 0 to 100% and the progress can even be tracked. Semantic layer is not added to SfM processing, but semantic segmentation or object detection can be layered on independently overly the purely geometric data. Layering-on additional modules for say, object detection, region classification, or even reasoning over scene changes helps to start with basic geometric layouts and add optinally to build comprehensive knowledge base. Algorithms that crunch these sensor data whether they are images or LiDAR data must operate in real-time and not on batch periodic analysis. They can, however, be dedicated to specific domains such as urban monitoring, agricultural surveying, or environmental monitoring for additional context-specific knowledge.