Cluster computing

With the surge of data science and analytics projects, many data scientists are required to build a chatbot application for their data. This article covers some of the ways to do that. We assume that a workspace is used by these data scientists to bring their compute and data together. Let us say that this is a databricks workspace and the data in available via the catalog and delta lake and the compute cluster has been provisioned as dedicated to this effort. The example/tutorial we refer to is published by the Databricks official documentation but is compared with the ease of use of exporting the user interface to an app service.

Part 1.

The example for Databricks separates the model and the user interface in this way :

Step 1. Set up the environment:

%pip install transformers sentence-transformers faiss-cpu

Step 2. Load the data into a Delta table:

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("Chatbot").getOrCreate()

# Load your data

data = [

{"id": 1, "text": "What is Databricks?"},

{"id": 2, "text": "How to create a Delta table?"}

]

df = spark.createDataFrame(data)

df.write.format("delta").save("/mnt/delta/chatbot_data")

Step 3. Generate embeddings using a pre-trained model:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')

texts = [row['text'] for row in data]

embeddings = model.encode(texts)

# Save embeddings

import numpy as np

np.save("/dbfs/mnt/delta/embeddings.npy", embeddings)

Step 4. Use FAISS to perform vector search over the embeddings.

import faiss

# Load embeddings

embeddings = np.load("/dbfs/mnt/delta/embeddings.npy")

# Create FAISS index

index = faiss.IndexFlatL2(embeddings.shape[1])

index.add(embeddings)

# Save the index

faiss.write_index(index, "/dbfs/mnt/delta/faiss_index")

Step 5. Create a function to handle user queries and return relevant responses.

def chatbot(query):

query_embedding = model.encode([query])

D, I = index.search(query_embedding, k=1)

response_id = I[0][0]

response_text = texts[response_id]

return response_text

# Test the chatbot

print(chatbot("Tell me about Databricks"))

Step 6. Deploy the chatbot as

Option a) Databricks widget

dbutils.widgets.text("query", "", "Enter your query")

query = dbutils.widgets.get("query")

if query:

response = chatbot(query)

print(f"Response: {response}")

else:

print("Please enter a query.")

Option b) a rest api

from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route('/chatbot', methods=['POST'])

def chatbot_endpoint():

query = request.json['query']

response = chatbot(query)

return jsonify({"response": response})

if __name__ == '__main__':

app.run(host='0.0.0.0', port=5000)

Step 7. Test the API:

For option a) use the widgets to interact with the notebook:

# Display the widgets

dbutils.widgets.text("query", "", "Enter your query")

query = dbutils.widgets.get("query")

if query:

response = chatbot(query)

displayHTML(f"<h3>Response:</h3><p>{response}</p>")

else:

displayHTML("<p>Please enter a query.</p>")

For option b) make a web request:

curl -X POST http://<your-databricks-url>:5000/chatbot -H "Content-Type: application/json" -d '{"query": "Tell me about Databricks"}'

Part 2.

The example for app service leverages the following query and user interface in this way:

The code hosting the model and completing the results of the query comprises of the following:

import openai, os, requests

openai.api_type = "azure"

# Azure OpenAI on your own data is only supported by the 2023-08-01-preview API version

openai.api_version = "2023-08-01-preview"

# Azure OpenAI setup

openai.api_base = "https://azai-open-1.openai.azure.com/" # Add your endpoint here

openai.api_key = os.getenv("OPENAI_API_KEY") # Add your OpenAI API key here

deployment_id = "mdl-gpt-35-turbo" # Add your deployment ID here

# Azure AI Search setup

search_endpoint = "https://searchrgopenaisadocs.search.windows.net"; # Add your Azure AI Search endpoint here

search_key = os.getenv("SEARCH_KEY"); # Add your Azure AI Search admin key here

search_index_name = "undefined"; # Add your Azure AI Search index name here

def setup_byod(deployment_id: str) -> None:

"""Sets up the OpenAI Python SDK to use your own data for the chat endpoint.

:param deployment_id: The deployment ID for the model to use with your own data.

To remove this configuration, simply set openai.requestssession to None.

"""

class BringYourOwnDataAdapter(requests.adapters.HTTPAdapter):

def send(self, request, **kwargs):

request.url = f"{openai.api_base}/openai/deployments/{deployment_id}/extensions/chat/completions?api-version={openai.api_version}"

return super().send(request, **kwargs)

session = requests.Session()

# Mount a custom adapter which will use the extensions endpoint for any call using the given `deployment_id`

session.mount(

prefix=f"{openai.api_base}/openai/deployments/{deployment_id}",

adapter=BringYourOwnDataAdapter()

)

openai.requestssession = session

setup_byod(deployment_id)

message_text = [{"role": "user", "content": "What are the differences between Azure Machine Learning and Azure AI services?"}]

completion = openai.ChatCompletion.create(

messages=message_text,

deployment_id=deployment_id,

dataSources=[ # camelCase is intentional, as this is the format the API expects

{

"type": "AzureCognitiveSearch",

"parameters": {

"endpoint": search_endpoint,

"key": search_key,

"indexName": search_index_name,

}

]

)

print(completion)

The user interface is simpler with code to host the app service as a react web app:

npm install @typebot.io/js @typebot.io/react

import { Standard } from "@typebot.io/react";

const App = () => {

return (

<Standard

typebot="basic-chat-gpt-civ35om"

style={{ width: "100%", height: "600px" }}

);

};

This concludes the creation of a chatbot function using the workspace.

Cluster computing

Monday, September 2, 2024

No comments:

Post a Comment