Part 2 – Reducing operational costs of chatbot model deployment.
This is the second part of the chatbot application discussion here.
The following strategies are required to reduce operational costs for the deployed chat model otherwise even idle ones can incur about a thousand dollars per month.
The app service plan for the app service that hosts the chat user interface must be reviewed for CPU, memory and storage.
It should be set to scale dynamically.
Caching mechanisms must be implemented to reduce the load to the app service. Azure Redis cache can help in this regard.
If the user interface has significant assets in terms of JavaScripts and images, Content Delivery Networks could be leveraged.
CDNs reduce latency and offload the traffic from the app service to distributed mirrors.
It might be hard to envision the model as a database but vector storage is used and there is an index as well. It is not just about embeddings matrix. Choosing the appropriate database tier and sku and optimizing the queries can help with the cost.
Monitoring and alerts can help to proactively identify performance bottlenecks, resource spikes and anomalies.
Azure Monitor and application insights can track metrics, diagnose issues, and optimize resource usage.
If the chat model experiences idle periods, then the associated resources can be stopped and scaled down during those times.
You don’t need the OpenAI service APIs. You only need the model APIs. Note the following:
Azure OpenAI Model API: this is the API to the GPT models used for text similarity, chat and traditional completion tasks.
Azure OpenAI service API: this encompasses not just the models but also the security, encryption, deployment and management functionalities to deploy models, manage endpoints and control access.
Azure OpenAI Search API allows the chatbot model to retrieve from various data sources.
Storing the vectors and the embeddings and querying the search APIs does not leverage the service API. The model APIs are a must so include that in the deployment but trim the data sources to just your data.
Sample deployment:
No comments:
Post a Comment