MLOps:
Most machine learning deployment patterns comprise of two
types of deployment patterns – online inference and batch inference. Both demonstrate MLOps principles and best
practices when developing, deploying, and monitoring machine learning models at
scale. Development and deployment are distinct from one another and although
the model may be containerized and retrieved for execution during deployment, it
can be developed independent of how it is deployed. This separates the concerns
for the development of the model from the requirements to address the online
and batch workloads. Regardless of the technology stack and the underlying
resources used during these two phases; typically, they are created in the
public cloud; this distinction serves the needs of the model as well.
For example, developing and training a model might require significant computing
but not so much as when executing it for predictions and outlier detections, activities
that are hallmarks of production environments. Even the workloads that make use
of the model might vary even from one batch processing stack to another and not
just between batch and online processing but the common operations of
collecting MELT data, named after metrics, events, logs and traces and
associated resources will stay the same. These include GitHub repository, Azure
Active Directory, cost management dashboards, Key Vaults, and in this case,
Azure Monitor. Resources and the practice associated with them for the purposes
of security and performance are being left out of this discussion, and the standard
DevOps guides from the public cloud providers call them out.
Online workloads targeting the model via API calls will usually
require the model to be hosted in a container and exposed via API management
services. Batch workloads, on the other hand, require an orchestration tool to co-ordinate
the jobs consuming the model. Within the deployment phase, it is a usual
practice to host more than one environment such as stage and production – both of
which are served by CI/CD pipelines that flows the model from development to
its usage. A manual approval is required to advance the model from the stage to
the production environment. A well-developed model is usually a composite
handling three distinct model activities – handling the prediction, determining
the data drift in features, and determining outliers in the features. Mature
MLOps also includes processes for explainability, performance profiling, versioning
and pipeline automations and such others. Depending on the resources used for DevOps and
the environment, typical artifacts would include dockerfiles, templates and
manifests.
While parts of the solution for this MLOps can be
internalized by studios and launch platforms, organizations like to invest in
specific compute, storage and networking for their needs. Databricks/Kubernetes,
Azure ML workspaces and such are used for compute, storage accounts and datastores
are used for storage, and diversified subnets are used for networking. Outbound
internet connectivity from the code hosted and executed in MLOps is usually not
required but it can be provisioned with the addition of a NAT gateway within the
subnet where it is hosted.
No comments:
Post a Comment