Monday, September 2, 2024

 With the surge of data science and analytics projects, many data scientists are required to build a chatbot application for their data.  This article covers some of the ways to do that. We assume that a workspace is used by these data scientists to bring their compute and data together. Let us say that this is a databricks workspace and the data in available via the catalog and delta lake and the compute cluster has been provisioned as dedicated to this effort. The example/tutorial we refer to is published by the Databricks official documentation but is compared with the ease of use of exporting the user interface to an app service. 

Part 1.

The example for Databricks separates the model and the user interface in this way :

Step 1. Set up the environment:

%pip install transformers sentence-transformers faiss-cpu

Step 2. Load the data into a Delta table:

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("Chatbot").getOrCreate()

# Load your data

data = [

    {"id": 1, "text": "What is Databricks?"},

    {"id": 2, "text": "How to create a Delta table?"}

]

df = spark.createDataFrame(data)

df.write.format("delta").save("/mnt/delta/chatbot_data")


Step 3.  Generate embeddings using a pre-trained model:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')

texts = [row['text'] for row in data]

embeddings = model.encode(texts)

# Save embeddings

import numpy as np

np.save("/dbfs/mnt/delta/embeddings.npy", embeddings)


Step 4. Use FAISS to perform vector search over the embeddings.

import faiss

# Load embeddings

embeddings = np.load("/dbfs/mnt/delta/embeddings.npy")

# Create FAISS index

index = faiss.IndexFlatL2(embeddings.shape[1])

index.add(embeddings)

# Save the index

faiss.write_index(index, "/dbfs/mnt/delta/faiss_index")


Step 5. Create a function to handle user queries and return relevant responses.

def chatbot(query):

    query_embedding = model.encode([query])

    D, I = index.search(query_embedding, k=1)

    response_id = I[0][0]

    response_text = texts[response_id]

    return response_text


# Test the chatbot

print(chatbot("Tell me about Databricks"))


Step 6. Deploy the chatbot as 

Option a) Databricks widget

dbutils.widgets.text("query", "", "Enter your query")

query = dbutils.widgets.get("query")


if query:

    response = chatbot(query)

    print(f"Response: {response}")

else:

    print("Please enter a query.")


Option b) a rest api 

from flask import Flask, request, jsonify


app = Flask(__name__)


@app.route('/chatbot', methods=['POST'])

def chatbot_endpoint():

    query = request.json['query']

    response = chatbot(query)

    return jsonify({"response": response})


if __name__ == '__main__':

    app.run(host='0.0.0.0', port=5000)


Step 7. Test the API:

For option a) use the widgets to interact with the notebook:

# Display the widgets

dbutils.widgets.text("query", "", "Enter your query")

query = dbutils.widgets.get("query")


if query:

    response = chatbot(query)

    displayHTML(f"<h3>Response:</h3><p>{response}</p>")

else:

    displayHTML("<p>Please enter a query.</p>")


For option b) make a web request:

curl -X POST http://<your-databricks-url>:5000/chatbot -H "Content-Type: application/json" -d '{"query": "Tell me about Databricks"}'


Part 2. 

The example for app service leverages the following query and user interface in this way:

The code hosting the model and completing the results of the query comprises of the following: 

import openai, os, requests 

 

openai.api_type = "azure" 

# Azure OpenAI on your own data is only supported by the 2023-08-01-preview API version 

openai.api_version = "2023-08-01-preview" 

 

# Azure OpenAI setup 

openai.api_base = "https://azai-open-1.openai.azure.com/" # Add your endpoint here 

openai.api_key = os.getenv("OPENAI_API_KEY") # Add your OpenAI API key here 

deployment_id = "mdl-gpt-35-turbo" # Add your deployment ID here 

 

# Azure AI Search setup 

search_endpoint = "https://searchrgopenaisadocs.search.windows.net"; # Add your Azure AI Search endpoint here 

search_key = os.getenv("SEARCH_KEY"); # Add your Azure AI Search admin key here 

search_index_name = "undefined"; # Add your Azure AI Search index name here 

 

def setup_byod(deployment_id: str) -> None: 

    """Sets up the OpenAI Python SDK to use your own data for the chat endpoint. 

 

    :param deployment_id: The deployment ID for the model to use with your own data. 

 

    To remove this configuration, simply set openai.requestssession to None. 

    """ 

 

    class BringYourOwnDataAdapter(requests.adapters.HTTPAdapter): 

 

        def send(self, request, **kwargs): 

            request.url = f"{openai.api_base}/openai/deployments/{deployment_id}/extensions/chat/completions?api-version={openai.api_version}" 

            return super().send(request, **kwargs) 

 

    session = requests.Session() 

 

    # Mount a custom adapter which will use the extensions endpoint for any call using the given `deployment_id` 

    session.mount( 

        prefix=f"{openai.api_base}/openai/deployments/{deployment_id}", 

        adapter=BringYourOwnDataAdapter() 

    ) 

 

    openai.requestssession = session 

 

setup_byod(deployment_id) 

 

 

message_text = [{"role": "user", "content": "What are the differences between Azure Machine Learning and Azure AI services?"}] 

 

completion = openai.ChatCompletion.create( 

    messages=message_text, 

    deployment_id=deployment_id, 

    dataSources=[  # camelCase is intentional, as this is the format the API expects 

        { 

            "type": "AzureCognitiveSearch", 

            "parameters": { 

                "endpoint": search_endpoint, 

                "key": search_key, 

                "indexName": search_index_name, 

            } 

        } 

    ] 

print(completion)


The user interface is simpler with code to host the app service as a react web app: 

npm install @typebot.io/js @typebot.io/react 

import { Standard } from "@typebot.io/react"; 

 

const App = () => { 

  return ( 

    <Standard 

      typebot="basic-chat-gpt-civ35om" 

      style={{ width: "100%", height: "600px" }} 

    /> 

  ); 

}; 


This concludes the creation of a chatbot function using the workspace.


No comments:

Post a Comment