With the surge of data science and analytics projects, many data scientists are required to build a chatbot application for their data. This article covers some of the ways to do that. We assume that a workspace is used by these data scientists to bring their compute and data together. Let us say that this is a databricks workspace and the data in available via the catalog and delta lake and the compute cluster has been provisioned as dedicated to this effort. The example/tutorial we refer to is published by the Databricks official documentation but is compared with the ease of use of exporting the user interface to an app service.
Part 1.
The example for Databricks separates the model and the user interface in this way :
Step 1. Set up the environment:
%pip install transformers sentence-transformers faiss-cpu
Step 2. Load the data into a Delta table:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("Chatbot").getOrCreate()
# Load your data
data = [
{"id": 1, "text": "What is Databricks?"},
{"id": 2, "text": "How to create a Delta table?"}
]
df = spark.createDataFrame(data)
df.write.format("delta").save("/mnt/delta/chatbot_data")
Step 3. Generate embeddings using a pre-trained model:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
texts = [row['text'] for row in data]
embeddings = model.encode(texts)
# Save embeddings
import numpy as np
np.save("/dbfs/mnt/delta/embeddings.npy", embeddings)
Step 4. Use FAISS to perform vector search over the embeddings.
import faiss
# Load embeddings
embeddings = np.load("/dbfs/mnt/delta/embeddings.npy")
# Create FAISS index
index = faiss.IndexFlatL2(embeddings.shape[1])
index.add(embeddings)
# Save the index
faiss.write_index(index, "/dbfs/mnt/delta/faiss_index")
Step 5. Create a function to handle user queries and return relevant responses.
def chatbot(query):
query_embedding = model.encode([query])
D, I = index.search(query_embedding, k=1)
response_id = I[0][0]
response_text = texts[response_id]
return response_text
# Test the chatbot
print(chatbot("Tell me about Databricks"))
Step 6. Deploy the chatbot as
Option a) Databricks widget
dbutils.widgets.text("query", "", "Enter your query")
query = dbutils.widgets.get("query")
if query:
response = chatbot(query)
print(f"Response: {response}")
else:
print("Please enter a query.")
Option b) a rest api
from flask import Flask, request, jsonify
app = Flask(__name__)
@app.route('/chatbot', methods=['POST'])
def chatbot_endpoint():
query = request.json['query']
response = chatbot(query)
return jsonify({"response": response})
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000)
Step 7. Test the API:
For option a) use the widgets to interact with the notebook:
# Display the widgets
dbutils.widgets.text("query", "", "Enter your query")
query = dbutils.widgets.get("query")
if query:
response = chatbot(query)
displayHTML(f"<h3>Response:</h3><p>{response}</p>")
else:
displayHTML("<p>Please enter a query.</p>")
For option b) make a web request:
curl -X POST http://<your-databricks-url>:5000/chatbot -H "Content-Type: application/json" -d '{"query": "Tell me about Databricks"}'
Part 2.
The example for app service leverages the following query and user interface in this way:
The code hosting the model and completing the results of the query comprises of the following:
import openai, os, requests
openai.api_type = "azure"
# Azure OpenAI on your own data is only supported by the 2023-08-01-preview API version
openai.api_version = "2023-08-01-preview"
# Azure OpenAI setup
openai.api_base = "https://azai-open-1.openai.azure.com/" # Add your endpoint here
openai.api_key = os.getenv("OPENAI_API_KEY") # Add your OpenAI API key here
deployment_id = "mdl-gpt-35-turbo" # Add your deployment ID here
# Azure AI Search setup
search_endpoint = "https://searchrgopenaisadocs.search.windows.net"; # Add your Azure AI Search endpoint here
search_key = os.getenv("SEARCH_KEY"); # Add your Azure AI Search admin key here
search_index_name = "undefined"; # Add your Azure AI Search index name here
def setup_byod(deployment_id: str) -> None:
"""Sets up the OpenAI Python SDK to use your own data for the chat endpoint.
:param deployment_id: The deployment ID for the model to use with your own data.
To remove this configuration, simply set openai.requestssession to None.
"""
class BringYourOwnDataAdapter(requests.adapters.HTTPAdapter):
def send(self, request, **kwargs):
request.url = f"{openai.api_base}/openai/deployments/{deployment_id}/extensions/chat/completions?api-version={openai.api_version}"
return super().send(request, **kwargs)
session = requests.Session()
# Mount a custom adapter which will use the extensions endpoint for any call using the given `deployment_id`
session.mount(
prefix=f"{openai.api_base}/openai/deployments/{deployment_id}",
adapter=BringYourOwnDataAdapter()
)
openai.requestssession = session
setup_byod(deployment_id)
message_text = [{"role": "user", "content": "What are the differences between Azure Machine Learning and Azure AI services?"}]
completion = openai.ChatCompletion.create(
messages=message_text,
deployment_id=deployment_id,
dataSources=[ # camelCase is intentional, as this is the format the API expects
{
"type": "AzureCognitiveSearch",
"parameters": {
"endpoint": search_endpoint,
"key": search_key,
"indexName": search_index_name,
}
}
]
)
print(completion)
The user interface is simpler with code to host the app service as a react web app:
npm install @typebot.io/js @typebot.io/react
import { Standard } from "@typebot.io/react";
const App = () => {
return (
<Standard
typebot="basic-chat-gpt-civ35om"
style={{ width: "100%", height: "600px" }}
/>
);
};
This concludes the creation of a chatbot function using the workspace.
No comments:
Post a Comment