Cluster computing

Sunday, March 30, 2025

A previous article1 described how the formation of a UAV swarm flows through space and time using waypoints and trajectory. While the shape formations in these cases are known, the size depends on the number of units in the formation, the minimum distance between units, the presence of external infringements and constraints and the margin required to maintain from such constraints. An earlier prototype2, also described the ability to distribute drones to spread as close to the constraints using self-organizing maps which is essentially drawing each unit to the nearest real-world element that imposes a constraint such as when drones fly through tunnels by following the walls. This establishes the maximum boundaries for the space that the UAV swarm occupies with the core being provided by the waypoints and trajectory that each unit of the swarm can follow one after the other in sequence if the constraints are too rigid or unpredictable. The progress along the trajectory spanning the waypoints continues to be with the help the center of the formation. Given the minimum-maximum combination and the various thresholds for the factors cited, the size of the shape for the UAV swarm at a point of time can be determined.

This article argues that the vectorization, clustering and model does not just apply to the UAV swarm formation in space but also applies to maintaining a balance between constraints and sizing and determining the quality of the formation, using Vector-Search-as-a-judge. The idea is borrowed from LLM-as-a-judge3 which helps to constantly evaluate and monitor many AI applications of various LLMs used for specific domains including Retrieval Augmented Generation aka RAG based chatbots. By virtue of automated evaluation with over 80% agreement on human judgements and a simple 1 to 5 grading scale, the balance between constraints and sizing can be consistently evaluated and even enforced. It may not be at par with human grading and might require several auto-evaluation samples, but these can be conducted virtually without any actual flights of UAV swarms. A good choice of hyperparameters is sufficient to ensure reproducibility, single-answer grading, and reasoning about the grading process. Emitting the metrics for correctness, comprehensiveness and readability is sufficient in this regard. The overall workflow for this judge is also like the self-organizing map in terms of data preparation, indexing relevant data, and information retrieval.

As with all AI models, it is important to ensure AI safety and security4 to include a diverse set of data and to leverage the proper separation of the read-write and read-only accesses needed between the model and the judge. Use of a feedback loop to emit the gradings as telemetry and its inclusion into the feedback loop for the model when deciding on the formation shape and size, albeit optional, can ensure the parameters of remaining under the constraints imposed is always met.

The shape and size of the UAV formation is deterministic at a point of time but how it changes over time depends on the selection of waypoints between source and destination as well as the duration permitted for the swarm to move collectively or stream through and regroup at the waypoint. A smooth trajectory was formed between the waypoints and each unit could adhere to the trajectory by tolerating formation variations.

Perhaps, the biggest contribution of the vectorization of all constraints in a landscape is that a selection of waypoints offering the least resistance for the UAV swarm to keep its shape and size to pass through can be determined by an inverse metric to the one that was used for self-organizing maps.

#Codingexercise

https://1drv.ms/w/c/d609fb70e39b65c8/Echlm-Nw-wkggNaVNQEAAAAB63QJqDjFIKM2Vwrg34NWVQ?e=grnBgD

Saturday, March 29, 2025

Measuring RAG performance:

Since a RAG Application has many aspects that affect its retrieval or generation quality, there must be ways to measure its performance, but this is still one of the most challenging parts of setting up a RAG Application. It sometimes helpful to evaluate each step of the RAG Application creation process independently. Both the model and the knowledge base must be effective.

The evaluations in retrieval step, for instance, involves identifying the relevant records that should be retrieved to address each prompt. A precision and recall metric such as F-score can come helpful in benchmarking and improvements. Generating good answers to those prompts can also be evaluated so that it is free of hallucinations and incorrect responses. Leveraging another LLM to provide prompts and to check responses can also be helpful and this technique is known as LLM-as-a-judge. The scores resulting from this technique must be simple and, in the range, say 1-5 with a higher rating indicating a true response to the context.

RAG isn’t the only approach to customizing to equipping models with new information, but any approach will involve trade-offs between cost, complexity and expressive power. Cost comes from inventory and bill of materials. Complexity means technical difficulty that is usually reflected in time, effort, and expertise required. Expressiveness refers to the model’s ability to generate diverse, inclusive, meaningful and useful responses to prompts.

Besides RAG, prompt engineering offers an alternative to guide a model’s outputs towards a desired result. Large and highly capable models are often required to understand and follow complex prompts and entail serving costs or per-token costs. This is especially useful when public data is sufficient and there is no need for proprietary or recent knowledge.

Improving overall performance also requires the model to be fine-tuned. This has a special meaning in the context of large language models where it refers to taking a pretrained model and adapting it to a new task or domain by adjusting some or all of its weights on new data. This is a necessary step for building a chatbot on say medical texts.

While RAG infuses data into the overall process, it does not change the model. Fine-tuning can change a model’s behavior, so that it need not be the same as when it was originally. It is also not a straightforward process and may not be as reliable as RAG in generating relevant responses

Friday, March 28, 2025

Measuring RAG performance:

#codingexercise: https://1drv.ms/w/c/d609fb70e39b65c8/EYKwhcLpZ3tAs0h6tU_RYxwBxeAeg1Vg2DH7deOt-niRhw?e=qbXLag

Thursday, March 27, 2025

RAG with vector search involves retrieving information using a vector database, augmenting the user’s prompt with that information and generating a response based on the user’s prompt and information retrieved using an LLM. Each of these steps can be implemented by a variety of approaches but we will go over the mainstream.

1. Data Preparation: This is all about data ingestion into a vector database usually with the help of connectors. This isn’t a one-time task because a vector database should be regularly updated and provide high-quality information for an embeddings model otherwise, it might sound outdated. The great part of the RAG process is that the LLM weights do not need to be adjusted as data is ingested. Some of the common stages in this step include parsing the input documents, splitting the documents into chunks which incidentally can affect the output quality, and using an embedding model to convert each of the chunks into a high-dimensional numerical vector, storing and indexing the embeddings which results in a vector index to boost search efficiency and recording metadata that can participate in filtering. The value of embeddings for RAG is in the articulation of similarity scores between the meanings of the set of original text.

2. Retrieval is all about getting the relevant context. After preprocessing the original documents, we have a vector database storing chunks, embeddings and metadata. In this step, the user provides a prompt which the application uses to query the vector database and with the relevant results, augments the original prompt in the next step. Querying the vector database is done with the help of similarity scores between the vector representing the query to those in the database. There are many ways to improve search results and these include hybrid search, reranking, summarized text comparison, contextual chunk retrieval, prompt refinement, and domain specific tuning.

3. Augmenting the prompt with the retrieved context equips the model with both the prompt and the context needed to address the prompt The structure of the new prompt that combines the retrieved texts and the users’ prompt can impact the quality of the result.

4. Generation of the response is done with the help of an LLM and follows after the retrieval and augmentation steps. Some LLMs are quite good at following instructions but many require post processing. Another important consideration is whether RAG system should have memory of previous prompts and responses. One way to enhance generation is to add multi-turn conversation ability which allows us to ask a follow-up question.

Last but not the least, RAG performance must be measured. Sometimes, other LLMs are used to judge the response quality with simple scores like a range of 1-5. Prompt engineering plays a significant role in such cases to guide a model’s results towards a desired result. Fine-tuning can enhance the model’s expressiveness and accuracy.

Wednesday, March 26, 2025

Tuesday, March 25, 2025

The vectors generated by embedding models are often stored in a specialized vector database. Vector databases are optimized for storing and retrieving vector data efficiently. Like traditional databases, vector databases can be used to manage permissions, metadata and data integrity, ensuring secure and organized access to information. They also tend to include update mechanisms so newly added texts are indexed and ready to use quickly.

The difference that a vector database and Retrieval Augmented Generation makes might be easier to explain with an example. When a chatbot powered by LLama2 LLM is asked about an acronym that was not part of its training text, it tends to guess and respond with an incorrect expansion and elaborating on what that might be. It does not even hint that it might be making things up. This is often referred to as hallucination. But if an RAG system is setup with access to documentation that explains what the acronym stands for, the relevant information is indexed and becomes part of the vector database, and the same prompt will now give a more pertinent information. With RAG, the LLM provides correct answers.

If the prompt was provided with the relevant documents that contain an answer, which is referred to as augmenting the prompt, the LLM can leverage that against the vector database and provide more compelling and coherent answers that would turn out to be knowledgeable as well as opposed to the hallucination referred above. By automating this process, we can make the chat responses to be more satisfactory every time. This might require additional steps of building a retrieval system backed by a vector database. It might also involve extra steps of data processing and managing the generated vectors. RAG also has added benefits for the LLM to consolidate multiple sources of data into a readable output tailored to the user's prompt. RAG applications can also incorporate proprietary data which makes it different from the public data that most LLM are trained on. The data can be up to date so that the LLM is not restricted to the point-in-time that it was trained on. RAG reduces hallucinations and allows the LLM to provide citations and query statistics to make the processing more transparent to the users. As with all retrieval systems, fine-grained data access control also brings about its own advantages.

There are four steps for building Retrieval-Augmented Generation (RAG):

1. Data Augmentation

a. Objective: Prepare data for a real-time knowledge base and contextualization in LLM queries by populating a vector database.

b. Process: Integrate disparate data using connectors, transform and refine raw data streams, and create vector embeddings from unstructured data. This step ensures that the latest version of proprietary data is instantly accessible for GenAI applications.

2. Inference

a. Objective: Connect relevant information with each prompt, contextualizing user queries and ensuring GenAI applications handle responses accurately.

b. Process: Continuously update the vector store with fresh sensor data. When a user prompt comes in, enrich and contextualize it in real-time with private data and data retrieved from the vector store. Stream this information to an LLM service and pass the generated response back to the web application.

3. Workflows

a. Objective: Parse natural language, synthesize necessary information, and use reasoning agents to determine the next steps to optimize performance.

b. Process: Break down complex queries into simpler steps using reasoning agents, which interact with external tools and resources. This involves multiple calls to different systems and APIs, processed by the LLM to give a coherent response. Stream Governance ensures data quality and compliance throughout the workflow.

4. Post-Processing

a. Objective: Validate LLM outputs and enforce business logic and compliance requirements to detect hallucinations and ensure trustworthy answers.

b. Process: Use frameworks like BPML or Morphir to perform sanity checks and other safeguards on data and queries associated with domain data. Decouple post-processing from the main application to allow different teams to develop independently. Apply complex business rules in real-time to ensure accuracy and compliance and use querying for deeper logic checks.

These steps collectively ensure that RAG systems provide accurate, relevant, and trustworthy responses by leveraging real-time data and domain-specific context.

Reference:

An earlier article: https://1drv.ms/w/c/d609fb70e39b65c8/EVqYhXoM2U5GpfzPsvndd1ABWXzGeXD1cixxJ9wRsWRh3g?e=aVoTd1

Monday, March 24, 2025

RAG:

The role of a RAG is to create a process of combining a user’s prompt with relevant external information to form a new expanded prompt for a large language model aka LLM. The expanded prompt enables the LLM to provide more relevant, timely and accurate responses that the direct querying based on just data embeddings on domain-specific data. Its importance lies in providing real-time, contextualized, and trustworthy data for UAV swarm applications. If LLMs could be considered as encapsulating business logic in AI applications, then RAG and knowledge bases can be considered as data platform services.

LLMs are machine learning algorithms that can interpret, manipulate, and generate text-based content. They are trained on massive text datasets from diverse sources, including books, internet scraped text, and code repositories. During the training process, the model learns statistical relationships between words and phrases, enabling it to generate new text using the context of text it has already seen or generated. LLMs are typically used via "prompting," which is text that a user provides to an LLM and that the LLM responds to. Prompts can take various forms, such as incomplete statements or questions or instructions and this is even more scarce when dealing with multimodal data from UAV swarm sensors. RAG applications that enable users to ask questions about text generally use instruction-following and question-answering LLMs. In RAG, the user's question or instruction is combined with some information retrieved from an external data source, forming the new, augmented prompt. This helps to overcome issues like hallucinations, maintaining up-to-date information, and overcoming domain-specific knowledge.

The four steps for building Retrieval-Augmented Generation (RAG) are:

1. Data Augmentation

a. Objective: Prepare data for a real-time knowledge base and contextualization in LLM queries by populating a vector database.

b. Process: Integrate disparate data using connectors, transform and refine raw data streams, and create vector embeddings from unstructured data. This step ensures that the latest version of UAV swarm proprietary data is instantly accessible for GenAI applications.

2. Inference

a. Objective: Connect relevant information with each prompt, contextualizing user queries and ensuring GenAI applications handle responses accurately.

3. Workflows

a. Objective: Parse natural language, synthesize necessary information, and use reasoning agents to determine the next steps to optimize performance.

4. Post-Processing

a. Objective: Validate LLM outputs and enforce business logic and compliance requirements to detect hallucinations and ensure trustworthy answers.

b. Process: Use frameworks like BPML or Morphir to perform sanity checks and other safeguards on data and queries associated UAV swarms. Decouple post-processing from the main application to allow different teams to develop independently. Apply complex business rules in real-time to ensure accuracy and compliance and use querying for deeper logic checks.

These steps collectively ensure that RAG systems provide accurate, relevant, and trustworthy responses by leveraging real-time data and domain-specific context.