Part 2 – Reducing operational
costs of chatbot model deployment.
This is the second part of the chatbot
application discussion here.
The following strategies are
required to reduce operational costs for the deployed chat model otherwise even
idle ones can incur about a thousand dollars per month.
1. The
app service plan for the app service that hosts the chat user interface must be
reviewed for CPU, memory and storage.
2. It
should be set to scale dynamically.
3. Caching
mechanisms must be implemented to reduce the load to the app service. Azure
Redis cache can help in this regard.
4. If
the user interface has significant assets in terms of JavaScripts and images,
Content Delivery Networks could be leveraged.
5. CDNs
reduce latency and offload the traffic from the app service to distributed
mirrors.
6. It
might be hard to envision the model as a database but vector storage is used and there is an index as well. It is not
just about embeddings matrix. Choosing
the appropriate database tier and sku and optimizing the queries can help with
the cost.
7. Monitoring
and alerts can help to proactively identify performance bottlenecks, resource
spikes and anomalies.
8. Azure
Monitor and application insights can track metrics, diagnose issues, and
optimize resource usage.
9. If
the chat model experiences idle periods, then the associated resources can be
stopped and scaled down during those times.
10. You
don’t need the OpenAI service APIs. You only need the model APIs. Note the
following:
a.
Azure OpenAI Model API: this is the API to the
GPT models used for text similarity, chat and traditional completion tasks.
b.
Azure OpenAI service API: this encompasses not
just the models but also the security, encryption, deployment and management
functionalities to deploy models, manage endpoints and control access.
c.
Azure OpenAI Search API allows the chatbot model
to retrieve from various data sources.
11. Storing
the vectors and the embeddings and querying the search APIs does not leverage
the service API. The model APIs are a must so include that in the deployment
but trim the data sources to just your data.
12. Sample
deployment:
No comments:
Post a Comment