Cluster computing

Saturday, March 22, 2025

This is a review of Retrieval Augmented Generation aka RAG but in a more general sense than the specific application to UAV swarm discussed earlier. This is a process of combining a user’s prompt with relevant external information to form a new expanded prompt for a large language model aka LLM. The expanded prompt enables the LLM to provide more relevant, timely and accurate responses.

LLMs are machine learning algorithms that can interpret, manipulate, and generate text-based content. They are trained on massive text datasets from diverse sources, including books, internet scraped text, and code repositories. During the training process, the model learns statistical relationships between words and phrases, enabling it to generate new text using the context of text it has already seen or generated. LLMs are typically used via "prompting," which is text that a user provides to an LLM and that the LLM responds to. Prompts can take various forms, such as incomplete statements or questions or instructions. RAG applications that enable users to ask questions about text generally use instruction-following and question-answering LLMs. In RAG, the user's question or instruction is combined with some information retrieved from an external data source, forming the new, augmented prompt.

An effective RAG application uses Vector Search to identify relevant text for a user's prompt. An embedding model translates each text into a numeric vector, encapsulating its meaning. This process converts the user's query to a comparable vector, allowing for mathematical comparison and identifying the most similar and relevant texts. These vectors represent the meanings of the text from which they are generated, enabling retrieval of the text most relevant to the user's query. However, embedding models may not capture the exact meaning desired, so it's essential to test and evaluate every component of a RAG application. Vector databases are optimized for storing and retrieving vector data efficiently, managing permissions, metadata, and data integrity. LLMs, however, are not reliable as knowledge sources and may respond with made-up answers or hallucinations. To mitigate these issues, explicit information can be provided to the LLM, such as copying and pasting reference documents to ChatGPT or another LLM. Implementing RAG with Vector Search can address the limitations of LLM-only approaches by providing additional context for the LLM to use when formulating an answer.

RAG applications enable the integration of proprietary data, up-to-date information, and enhanced accuracy of LLM responses. They provide access to internal documents and communications, reducing the occurrence of hallucinations and allowing for human verification. RAG also enables fine-grained data access control, allowing LLMs to securely reference confidential or personal data based on user access credentials. It equips LLMs with context-specific information, enabling applications that LLMs alone may not generate reliably. RAG is particularly useful in question-answering systems, customer service, content generation, and code assistance. For instance, a large e-commerce company uses Databricks for an internal RAG application, allowing HR to query hundreds of employee policy documents. RAG systems can also streamline the customer service process by providing personalized responses to customer queries, enhancing customer experience and reducing response times. Additionally, RAG can enhance code completion and Q&A systems by intelligently searching and retrieving information from code bases, documentation, and external libraries.

RAG with Vector Search is a process that involves retrieving information from an external source, augmenting the user's prompt with that information, and generating a response based on the user's prompt and information retrieved using an LLM. Data preparation is a continuous process, involving parsing raw input documents into text format, splitting documents into chunks, and embedding the text chunks. The choice of chunk size depends on the source documents, LLM, and the RAG application's goals.

Embeddings are a type of language model that generates numeric vectors or series of numbers from a text, encoding the nuanced and context-specific meaning of each text. They can be mathematically compared to each other, allowing for better understanding of the meanings of the original texts.

Embeddings are stored in a specialized vector database, which efficiently stores and searches for vector data like embeddings. Vector databases often incorporate update mechanisms to allow for easy searching of newly added chunks. Overall, RAG with Vector Search is a valuable tool for generating effective and relevant responses.

Cluster computing

Saturday, March 22, 2025

No comments:

Post a Comment