A vector database and search for positioning drones involves the following steps:
The first step would be to install all the required packages and libraries. We use Python in this sample:
import warnings
warnings.filterwarnings(‘ignore’)
from datasets import load_dataset
from pinecone import Pinecone, ServerlessSpec
from DLAIUtils import Utils
import DLAIUtils
import os
import time
import torch
From tqdm.auto import tqdm
We assume the elements are mapped as embeddings in a 384-dimensional dense vector space.
A sample query would appear like this:
query = `what is node nearest this element?`
xq = model.encode(query)
xq.shape
(384,)
The next step is to set up the Pinecone vector database to upsert embeddings into it. These database index vectors make search and retrieval easy by comparing values and finding those that are most like one-another
utils = Utils()
PINECONE_API_KEY = utils.get_pinecone_api_key()
if INDEX_NAME in [index.name for index in pinecone.list_indexes()]:
pinecone.delete_index(INDEX_NAME)
print(INDEX_NAME)
pinecone.create_index(name=INDEX_NAME, dimension=model.get_sentence_embedding_dimension(), metric=’cosine’,spec=ServerlessSpec(cloud=’aws’, region=’us-west-2’))
index = pinecone.Index(INDEX_NAME)
print(index)
Then, the next step is to create embeddings for all the elements in the sample space and upsert them to Pinecone.
batch_size=200
vector_limit=10000
elements=element[:vector_limit]
import json
for i in tqdm(range(0, len(elements), batch_size)):
i_end = min(i+batch_size, len(elements))
ids = [str(x) for x in range(i, i_end)]
metadata = [{‘text’: text} for text in elements[i:i_end]]
xc = model.encode(elements[i:i_end])
records = zip(ids, xc, metadata)
index.upsert(vectors=records)
index.describe_index_stats()
Then the query can be run on the embeddings and the top matches can be returned.
def run_query(query):
embedding = model.encode(query).tolist()
results = index.query(top_k=10, vector=embedding, include_metadata=True, include_value)
for result in results[‘matches’]:
print(f”{round(result[‘score’], 2)}: {result[‘metadata’][‘node’]}”)
run_query(“what is node nearest this element?”)
With this, the embeddings-based search over elements is ready. In Azure, cosmos DB offers a similar semantic search and works as a similar vector database.
The following code outlines the steps using Azure AI Search
# configure the vector store settings, vector name is in the index of the search
endpoint: str = "<AzureSearchEndpoint>"
key: str = "<AzureSearchKey>"
index_name: str = "<VectorName>"
credential = AzureKeyCredential(key)
client = SearchClient(endpoint=endpoint,
index_name=index_name,
credential=credential)
# create embeddings
embeddings: AzureOpenAIEmbeddings = AzureOpenAIEmbeddings(
azure_deployment=azure_deployment,
openai_api_version=azure_openai_api_version,
azure_endpoint=azure_endpoint,
api_key=azure_openai_api_key,
)
# create vector store
vector_store = AzureSearch(
azure_search_endpoint=endpoint,
azure_search_key=key,
index_name=index_name,
embedding_function=embeddings.embed_query,
)
# create a query
docs = vector_store.similarity_search(
query=userQuery,
k=3,
search_type="similarity",
)
collections.insert_many(docs)
reference: https://github.com/ravibeta/Node-Element-Predictions
No comments:
Post a Comment