Cluster computing

Tuesday, April 13, 2021

Applications of Data Mining to Reward points collection service

Continuation of discussion in terms of Machine Learning deployments

Some of the other advantages in deploying machine learning models to the public cloud include the following:

1) Readymade automation for machine learning pipelines that can be monitored 24x7.

2) Ability to span on-premises and public cloud with virtual hybrid cloud

3) Elasticity of computing resources for machine learning workload including support for GPU

4) building consistency into machine learning deployments

5) Machine Learning deployments can have variable workloads during the lifetime of the model. The cloud resources are better able to scale up and down as needed.

6) ML solutions can take advantage of all the data at once in the cloud without waiting for Extract-Transform-Load that had become a necessity with warehouses. Even virtual data warehouses are available in the cloud if they must be used.

7) Cloud security is robust and this secures the data at rest as well as transit reducing the onus around the maintenance of data in the cloud.

8) Cost is transparent in the pay-as-you-go mode of billing and various tools are available to monitor usage and costs

9) Rate limiting technologies are numerous in addition to native techniques in the cloud and these can prevent the overrun of costs during experimentation

10) Free tier is available for quick and dirty prototyping in the public cloud that would help to find hidden costs for production systems.

The following chart makes a comparison of all the data mining algorithms including the neural networks: https://1drv.ms/w/s!Ashlm-Nw-wnWxBFlhCtfFkoVDRDa?e=aVT37e

Thank you.

Monday, April 12, 2021

Applications of Data Mining to Reward points collection service

Continuation of discussion in terms of Machine Learning deployments

Machine learning algorithms are a tiny fraction of the overall code that is used to realize prediction systems in production. As noted in the paper on “Hidden Technical Debt in Machine Learning systems” by Sculley, Holt, and others, the machine learning code comprises mainly of the model but all the other components such as configuration, data collection, features extraction, data verification, process management tools, machine resource management, serving infrastructure, and monitoring comprise the rest of the stack. These components are hybrid in nature when they are deployed on-premise. Public clouds lead the way in standardizing deployment, monitoring, and operations for machine learning deployments. Not all development teams are empowered to transition to the public cloud because the costs of usage are difficult to articulate upfront to the management and the billing is based on the pay-as-you-go model. A Continuous Integration / Continuous Deployment (CI/CD) pipeline, ML tests, and model tuning become a responsibility for the development team even though they are folded into the business service team for faster turn-around time to deploy artificial intelligence models in production. In-house automation and development of Machine Learning pipelines and monitoring system do not compare to those from the public clouds which make it easier for automation and programmability. Yet, the transition to the public cloud ML pipeline from in-house solution lags. We review some of the arguments against this migration:

First, ML pipeline is a newer technology as compared to traditional software development stacks and management advises that developers have more freedom to explore options on-premises with less cost. Even high-technology large companies with significant investments in hybrid cloud and their own datacenters argue against the use of public cloud technologies. This is not merely from a business point of view, it is also founded with the technical reason that in-house solutions will be better customized to the ML model developments those companies are looking for. Also, experimentation can get out of control from the limits allowed for free-tier. The cost is not always clear and it always comes down to an argument about the justification of numbers for both options but the cost is considered lower in favor of the hybrid cloud.

Second, event processing systems such as Apache Spark and Kafka find it easier to replace Extract-Transform-Load solutions that proliferated with a data warehouse. It is true that much of the training data for ML pipelines comes from a data warehouse and ETL worsened data duplication and drift making it necessary to add workarounds in business logic. With a cleaner event-driven system, it becomes easier to migrate to immutable data, write-once business logic, and real-time data processing systems. Event processing systems is easier to develop on-premises even as staging before it is attempted to be deployed to the cloud.

Third, Machine learning models are end-products. They can be hosted in a variety of environments, not just the cloud. Some ML users would like to load the model in client applications including those on mobile devices. The model as a service option is rather narrow and does not have to be made available over the internet in all cases especially when the network hop is going to be costly to real-time processing systems. Many IoT traffic and experts agree that the streaming data from edge devices can be quite heavy in traffic where an online on-premise system will out-perform any public-cloud option. Internet TCP relays are of the order of 250-350 milliseconds whereas the ingestion rate for real-time analysis can be upwards of thousands of events per second.

The following chart makes a comparison of all the data mining algorithms including the neural networks: https://1drv.ms/w/s!Ashlm-Nw-wnWxBFlhCtfFkoVDRDa?e=aVT37e

Thank you.

Sunday, April 11, 2021

Applications of Data Mining to Reward points collection service

Continuation of discussion in terms of Machine Learning deployments

Hybrid stacks is not the only concern. There are a few other concerns as well. Architectural patterns are harder to enforce with Machine Learning deployments. Traditional web application deployments have significant and growing eco-system of infrastructure, tools and processes to benefit from. But machine learning systems are not always equivalent to a predictive web service. Many models are trained and tested with little or no requirements for outside world connectivity or programmability. Again, the public clouds lead the way in standardizing deployment, monitoring and operations for machine learning deployments.

Lastly, the machine learning field is emerging, and development teams continuously try and experiment with algorithms, data and technology stacks before establishing a process that lets them switch between use cases and production deployments. A Continuous Integration / Continuous Deployment (CI/CD) pipeline, ML tests and model tuning become a responsibility for the development team even though they are folded into the business service team for faster turn-around time to deploy artificial intelligence models in production. Public clouds make it easy to monitor, troubleshoot and update models in production system deployments but the development team continues to be responsible for the number and scale of such deployments.

The following chart makes a comparison of all the data mining algorithms including the neural networks: https://1drv.ms/w/s!Ashlm-Nw-wnWxBFlhCtfFkoVDRDa?e=aVT37e

Thank you.