This is a continuation of series of articles on hosting solutions and services on Azure public cloud with the most recent discussion on Service Fabric as per the summary here. This article discusses the architectural approaches for AI and ML in a multitenant solution.
Both conventional and modern applications can be multitenant without sacrificing any aspects of their dedication to solving core business requirements. The infrastructure and the architecture approaches differ from a multitenancy without a change in the overall purpose of the technology to meet business requirements. A modern multitenant application can leverage AI/ML-based capabilities to any number of tenants. These tenants continue to remain isolated from one another and can’t see each other’s data but the curated model that they use can be shared and may have been developed from a comprehensive training set and perhaps in a pipeline different from where the model is currently hosted.
When a multitenant application needs to consider the requirements for data and model for AI/ML purposes, it must consider the requirements around both the training and the deployment. The compute resources required for training are significantly different from those required for deployment. For example, For example, an AI/ML model written in TensorFlow might require a Keras layer. A Keras layer is like a backend and can run on Colab environment. Keras can help author the model and deploy it to an environment such as Colab where the model can be trained on a GPU. Once the training is done, the model can be loaded and run anywhere else including a browser. The power of TensorFlow is in its ability to load the model and make predictions in the browser itself. As with any ML learning example, the data is split into 70% training set and 30% test set. There is no order to the data and the split is taken over a random set. With the model and training/test sets defined, it is now as easy to evaluate the model and run the inference. The model can also be saved and restored. It is executed faster when there is GPU added to the computing.
When the model is trained, it can be done in batches of predefined size. The number of passes of the entire training dataset called epochs can also be set up front. These are called model tuning parameters. Every model has a speed, Mean Average Precision and output. The higher the precision, the lower the speed. It is helpful to visualize the training with the help of a high chart that updates the chart with the loss after each epoch. Usually there will be a downward trend in the loss which is referred to as the model is converging.
When the model is trained, it might take a lot of time say about 4 hours. When the test data has been evaluated, the model’s efficiency can be predicted using precision and recall, terms that are used to refer to positive inferences by the model and those that were indeed positive within those inferences.
In a multitenant application, the tenancy model affects each stage of the AI/ML model lifecycle. The overall solution provides accurate results only when the model runs correctly.
One of the best practices around AI/ML models is to treat them just as sensitive as raw data that trained them. The tenants understand how the data Tenants must also understand how their data is used to train the model and how the model trained on others data is used for inference purposes on their workloads.
There are three common approaches for working with AI/ML models that are: 1. Tenant-specific models, 2. Shared models and 3. Tuned shared models and these are similar to the resource sharing for tenants.
No comments:
Post a Comment