Sunday, March 23, 2025

 The four steps for building Retrieval-Augmented Generation (RAG) are:

1. Data Augmentation

a. Objective: Prepare data for a real-time knowledge base and contextualization in LLM queries by populating a vector database.

b. Process: Integrate disparate data using connectors, transform and refine raw data streams, and create vector embeddings from unstructured data. This step ensures that the latest version of UAV swarm proprietary data is instantly accessible for GenAI applications.

2. Inference

a. Objective: Connect relevant information with each prompt, contextualizing user queries and ensuring GenAI applications handle responses accurately.

b. Process: Continuously update the vector store with fresh sensor data. When a user prompt comes in, enrich and contextualize it in real-time with private data and data retrieved from the vector store. Stream this information to an LLM service and pass the generated response back to the web application.

3. Workflows

a. Objective: Parse natural language, synthesize necessary information, and use reasoning agents to determine the next steps to optimize performance.

b. Process: Break down complex queries into simpler steps using reasoning agents, which interact with external tools and resources. This involves multiple calls to different systems and APIs, processed by the LLM to give a coherent response. Stream Governance ensures data quality and compliance throughout the workflow.

4. Post-Processing

a. Objective: Validate LLM outputs and enforce business logic and compliance requirements to detect hallucinations and ensure trustworthy answers.

b. Process: Use frameworks like BPML or Morphir to perform sanity checks and other safeguards on data and queries associated UAV swarms. Decouple post-processing from the main application to allow different teams to develop independently. Apply complex business rules in real-time to ensure accuracy and compliance and use querying for deeper logic checks.

These steps collectively ensure that RAG systems provide accurate, relevant, and trustworthy responses by leveraging real-time data and domain-specific context.


Saturday, March 22, 2025

 This is a review of Retrieval Augmented Generation aka RAG but in a more general sense than the specific application to UAV swarm discussed earlier.  This is a process of combining a user’s prompt with relevant external information to form a new expanded prompt for a large language model aka LLM. The expanded prompt enables the LLM to provide more relevant, timely and accurate responses. 

LLMs are machine learning algorithms that can interpret, manipulate, and generate text-based content. They are trained on massive text datasets from diverse sources, including books, internet scraped text, and code repositories. During the training process, the model learns statistical relationships between words and phrases, enabling it to generate new text using the context of text it has already seen or generated. LLMs are typically used via "prompting," which is text that a user provides to an LLM and that the LLM responds to. Prompts can take various forms, such as incomplete statements or questions or instructions. RAG applications that enable users to ask questions about text generally use instruction-following and question-answering LLMs. In RAG, the user's question or instruction is combined with some information retrieved from an external data source, forming the new, augmented prompt. 

An effective RAG application uses Vector Search to identify relevant text for a user's prompt. An embedding model translates each text into a numeric vector, encapsulating its meaning. This process converts the user's query to a comparable vector, allowing for mathematical comparison and identifying the most similar and relevant texts. These vectors represent the meanings of the text from which they are generated, enabling retrieval of the text most relevant to the user's query. However, embedding models may not capture the exact meaning desired, so it's essential to test and evaluate every component of a RAG application. Vector databases are optimized for storing and retrieving vector data efficiently, managing permissions, metadata, and data integrity. LLMs, however, are not reliable as knowledge sources and may respond with made-up answers or hallucinations. To mitigate these issues, explicit information can be provided to the LLM, such as copying and pasting reference documents to ChatGPT or another LLM. Implementing RAG with Vector Search can address the limitations of LLM-only approaches by providing additional context for the LLM to use when formulating an answer. 

RAG applications enable the integration of proprietary data, up-to-date information, and enhanced accuracy of LLM responses. They provide access to internal documents and communications, reducing the occurrence of hallucinations and allowing for human verification. RAG also enables fine-grained data access control, allowing LLMs to securely reference confidential or personal data based on user access credentials. It equips LLMs with context-specific information, enabling applications that LLMs alone may not generate reliably. RAG is particularly useful in question-answering systems, customer service, content generation, and code assistance. For instance, a large e-commerce company uses Databricks for an internal RAG application, allowing HR to query hundreds of employee policy documents. RAG systems can also streamline the customer service process by providing personalized responses to customer queries, enhancing customer experience and reducing response times. Additionally, RAG can enhance code completion and Q&A systems by intelligently searching and retrieving information from code bases, documentation, and external libraries. 

RAG with Vector Search is a process that involves retrieving information from an external source, augmenting the user's prompt with that information, and generating a response based on the user's prompt and information retrieved using an LLM. Data preparation is a continuous process, involving parsing raw input documents into text format, splitting documents into chunks, and embedding the text chunks. The choice of chunk size depends on the source documents, LLM, and the RAG application's goals. 

Embeddings are a type of language model that generates numeric vectors or series of numbers from a text, encoding the nuanced and context-specific meaning of each text. They can be mathematically compared to each other, allowing for better understanding of the meanings of the original texts. 

Embeddings are stored in a specialized vector database, which efficiently stores and searches for vector data like embeddings. Vector databases often incorporate update mechanisms to allow for easy searching of newly added chunks. Overall, RAG with Vector Search is a valuable tool for generating effective and relevant responses. 

Friday, March 21, 2025

 Emerging trends:

Constructing an incremental “knowledge base” of a landscape from drone imagery merges ideas from simultaneous localization and mapping (SLAM), structure-from-motion (SfM), and semantic segmentation. Incremental SLAM and 3D reconstruction is suggested in the ORB-SLAM2 paper by Mur-Atal and Tardos in 2017 where a 3D Map is built by estimating camera poses and reconstructing scene geometry from monocular, stereo, or RGB-D inputs. Such SLAM framework can also be extended by fusing in semantic cues to enrich the resulting map with object and scene labels The idea of including semantic information for 3D reconstruction is demonstrated by SemanticFusion written by McCormick et al for ICCV 2017 where they use a Convolutional Neural Network aka CNN for semantic segmentation as their system fuses semantic labels into a surfel-based 3D map, thereby transforming a purely geometric reconstruction into a semantically rich representation of a scene. SemanticFusion helps to label parts of the scene – turning a raw point cloud or mesh into a knowledge base where objects, surfaces and even relationships can be recognized and queries. SfM, on the other hand, helps to stitch multi-view data into a consistent 3D-model where the techniques are particularly relevant for drone applications. Incremental SfM pipelines can populate information about a 3D space based on the data that arrives in the pipeline, but the drones can “walk the grid” around an area of interest to make sure sufficient data is captured to buid the 3D-model from 0 to 100% and the progress can even be tracked. Semantic layer is not added to SfM processing, but semantic segmentation or object detection can be layered on independently overly the purely geometric data. Layering-on additional modules for say, object detection, region classification, or even reasoning over scene changes helps to start with basic geometric layouts and add optinally to build comprehensive knowledge base. Algorithms that crunch these sensor data whether they are images or LiDAR data must operate in real-time and not on batch periodic analysis. They can, however, be dedicated to specific domains such as urban monitoring, agricultural surveying, or environmental monitoring for additional context-specific knowledge.


Thursday, March 20, 2025

 An earlier article1 described the creation and usage of a Knowledge Base for LLMs. One of the ideas emphasized behind is the end-to-end service expectations from the system and not just the provisioning of a vector database. In this regard, it is important to call out that semantic similarity and embeddings just does not cut it to capture the nuances of a query. In vector databases, each data point (document, image, or any object) is often stored along with metadata – structured information that provides additional context. For example, metadata could include attributes like timestamp, author, location, category, etc. During a vector search, filters can be applied on this metadata to narrow down the results, ensuring only relevant items are retrieved. This is particularly helpful when the dataset is large and diverse. This technique is sometimes referred to as “metadata filtering”

Some examples of where this makes a difference include:

1. Product recommendations: This case involves an e-commerce vector search where product embeddings are used to find similar items. If a customer searches for “lightweight hiking shoes,” the vector embeddings find semantically similar products. Adding a metadata filter like gender: female or brand: Columbia ensures the results align with specific requirements.

2. Content Moderation or compliance: Imagine a company using vector search to identify similar documents across various teams. By filtering metadata like department: legal or classification: confidential, only the relevant documents are retrieved. This prevents retrieving semantically similar but irrelevant documents from unrelated teams or departments.

3. Geospatial Search: A travel app uses vector embeddings to recommend destinations based on a user’s travel history and preferences. Using metadata filters for location: within 100 miles ensures the recommendations are regionally relevant.

4. Media Libraries: In a vector search for images, combining embeddings with metadata like resolution: >=1080p or author: John Doe helps surface high-quality or specific submissions.

And some examples where it doesn’t:

1. Homogeneous Datasets: If the dataset lacks meaningful metadata (e.g., all records have the same category or timestamp), filtering doesn’t add value because the metadata doesn’t differentiate between records.

2. Highly Unstructured Queries: For a generic query like “artificial intelligence” in a research database, metadata filtering might not help much if the user is looking for broad, cross-disciplinary results. Overly restrictive filters could exclude valuable documents.

3. When Metadata is Sparse or Inaccurate: If the metadata is inconsistently applied or missing in many records, relying on filters can lead to incomplete or skewed results.

Another technique that improves query responses is “contextual embeddings”.This improves retrieval accuracy, cutting failures with re-ranking. It involves both a well-known Retrieval Augmented Generation technique with semantic search using embeddings and lexical search using sparse retrievers like BM25. The entire knowledge base is split into chunks. Both the TF-IDF encodings as well as semantic embeddings are generated. Parallel searches using both lexical and semantic searches are run. The results are then combined and ranked. The most relevant chunks are located, and the response is generated with enhanced context. This enhancement over multimodal embeddings and GraphRAG2 is inspired by Anthropic and a Microsoft Community blog.

#Codingexercise

https://1drv.ms/w/c/d609fb70e39b65c8/EdJ3VDeiX2hGgAjzKHaFVoYBTCOvDz2W8EjTCUg08hyWkQ?e=BDjivM


Wednesday, March 19, 2025

Use of RAG in creating KB

 Gen AI created a new set of applications that require a different data architecture than traditional systems which includes structured and unstructured data. Applications like chatbot can perform satisfactorily only with information from diverse data sources. A chatbot requires an LLM model to respond with information from a knowledge base, typically a vector database. The underlying principle in a chatbot is Retrieval Augmented Generation. The LLM could be newer GPT3.5 or GPT4 to reduce hallucinations, maintain up-to-date information, and leverage domain-specific knowledge.

As with all LLMs, it is important to ensure AI safety and security1 to include a diverse set of data and to leverage the proper separation of the read-write and read-only accesses needed between the model and the judge. Use of a feedback loop to emit the gradings as telemetry and its inclusion into the feedback loop for the model when deciding on the formation shape and size, albeit optional, can ensure the parameters of remaining under the constraints imposed is always met.

Evaluating the quality of chatbot responses must take into account both the knowledge base and the model involved. LLM-as-a-judge evaluates the quality of a chatbot as an external entity. Although, it suffers from limitations such as it may not be at par with human grading, it might require several auto-evaluation samples, it may have different responsiveness to different chatbot prompts and slight variations in the prompt or problem can drastically affect its performance, it can still agree on human grading on over 80% of judgements. This is achieved by using a 1-5 grading scale, using GPT-3.5 to save costs and when there is one grading example per score and using GPT-4 when there are no examples to understand grading rules.


Tuesday, March 18, 2025

 The use of Large Language Model (LLM) for building a knowledge base (KB) seems to be a tribal art but in fact, it is applicable here as in the vast collection of domain specific text across many industries. A knowledge graph captures relationships between entities so bot the nodes and the edges are important to discover and there is no estimate of precision and recall to begin with. We take a specific example one application of LLM to build a KB with IText2KB. This is a zero-shot method for constructing incremental, topic-independent knowledge graphs from unstructured data using large-language models, without the need for extensive post-processing which is one of the main challenges for constructing knowledge graphs. Other challenges generally include the unstructured data type which might result in lossy processing and require advanced NLP techniques for meaningful insights, few-shot learning and cross-domain knowledge extraction. NLP techniques, in turn, face limitations, including reliance on pre-defined entities, and extensive human annotation.

This approach consists of four modules: Document distiller, Incremental Entities Extractor, Incremental Relations Extractor, and Neo4J graph integrator. The Document Distiller uses LLMs specifically GPT-4 to rewrite documents into semantic blocks, guided by a flexible schema to enhance graph construction. The Incremental Entities Extractor iteratively builds a global entity set by matching local entities from documents with previously extracted global entities. The Incremental Relations Extractor utilizes global document entities to extract both stated and implied relations, with variations based on the context provided. The approach is adaptable to various use cases, as the schema can be customized based on user preferences. The final module integrates the extracted entities and relations into a Neo4j database to visualize the knowledge graph. This forms a zero-shot technique because there are no predefined examples or ontologies.

The effectiveness of this technique which has broad applicability, can best be described by some metrics such as schema consistency scores across documents where a high score reflects high performance, information consistency metric where the higher consistency is desirable, triplet extraction precision which is more for local context-specific entites than for global entites and affects the richness of the graph, the false discovery rate which should be as low as possible for a successful entity/resolution process and estimation of cosine similarity for merging entites and relationships and to remove duplicates. This method outperforms on all these metrics. The results from experiments with documents such as CVs, scientific articles and websites have also emphasized effective data refinement and impact of document chunk size on KG construction.


Monday, March 17, 2025

Every use case listed in the scenarios targeted by the UAV swarm cloud automation maps to a set of direct and current commercial players and companies who could benefit and many across industries are catching up to the emerging AI trends and deploying applications with LLMs and RAGs. So, explaining the benefits of cloud-based pipelines to continuously analyze drone imagery to build a knowledge base of landscape would not be lost on many of them. But the ideal partner for this venture would be someone who engages deeply from the start so that field tests are not only practical but routine. The value of the software will be better articulated through the voice of the customer rather than the founder and such a partnership will likely be a win-win for both from the get-go.  This article explains not only the selection but the method of engagement with such a partner. 

Drone imagery is popular today in many defense-industry related applications by virtue of remote-operated drones. The use of UAV swarm is however better applied to surveying, remote sensing, disaster preparedness and responses such as wildfires, and those that make use of LiDAR data. Power line and windmill monitoring companies are especially suited for making use of a fleet of drones. Besides, there are over ten LiDAR companies that are public in US and many more across Europe and Asia that make use of a fleet of drones, photogrammetry and LiDAR data. Those that are using simultaneous localization and mapping (SLAM), structure-from-motion (SfM), and semantic segmentation with CNNs are possibly building their own knowledge bases, so it would not hurt to show them one that is built in the cloud in incremental, observable and near real-time.  

The right way to build this software is also iterative with stakeholder input. We leverage the agile sprint-based approach to build this software. Keeping a community edition or open source, opens engagement with the partner while drawing developer audience from source code and community platforms, including marketplaces for public clouds and journals, newsletters, podcasts, and social media platforms where we can find contacts and leads. A specific milestone could be pegged as presenting a PoC at AI Infra summit.  

Aside from technical aspect, a winning business plan could target a market that’s both well-defined and large, so this can be an advantage when fundraising or getting the attention of an investor. Polishing the business plan and addressing weakness prevents the investors from having to micromanage – an unhealthy situation. VCs also make it known through social media and other marketing avenues that they are funding startups but in casting a net, it is important to establish a shared reality of success. Consistently proving that the founding idea is going to work will have a snowball effect. While an experienced founder may bank on VC contacts, the first-time founder can dodge an obstacle course of promising leads that go nowhere and may have to rely on angel investors. Pitch-deck and follow-up calls must be rehearsed and never emailed or done offline. Controlling the narrative, reframing the questions, and answering them on our terms are in our hands. Creating a contract might be necessary but it cannot be relied upon. Developing a sales funnel during external engagements is important. From the start, an open-source GitHub for curating ideas and implementations should be made available.