Cluster computing

Tuesday, April 20, 2021

Matrix factorization for recommendation systems.

What is a recommendation system?

A recommendation system provides choices when a user makes a purchase or a request. For example, this system shows the user what choices others make so that the user can be better informed. Another example is when the system shows the user what other items are like the one being requested.

Where can I use a recommendation system?

Recommendations include suggestions for online shopping, suggesting interesting websites, or finding items such as movies or music. Since it can be based on user preferences or item similarity, the choices are not restricted to any domain. This kind of recommendation system when applied to any catalog in any business, performs as a collaborative filter.
There are certain filtering cases where divulging which users go with what preferences are helpful to the end-user. At other times, it is preferable to use item-based similarity. The similarity scores are computed in both cases. All other considerations being the same, the item-based approach is better for sparse datasets. Both user-based and item-based approaches perform similarly for the dense dataset. 

How does it work?

To make a recommendation, first, a group sharing similar tastes is found and then the preferences of the group are used to make a ranked list of suggestions. This technique is called collaborative filtering. A common data structure that helps with keeping track of people and their preferences is a nested dictionary. This dictionary could use a quantitative ranking say on a scale of 1 to 5 to denote the preferences of the people in the selected group.  To find similar people to form a group, we use some form of a similarity score. One way to calculate this score is to plot the items that the people have ranked in common and use them as axes in a chart. Then the people who are close together on the chart can form a group. 

What is matrix factorization?

User choices are often represented as a matrix. For example, Netflix provided a training dataset of over 100 million ratings from nearly half a million users on over seventeen thousand movies. Given this matrix of ratings between m users and n movies, matrix factorization decomposes it into m *k users matrix and k * n movies matrix such that it becomes a recommendation system that leverages the latent features between two entities.

How does Matrix factorization improve recommendation systems?

Decomposition helps solve for x in the system of linear equations denoted by Ax = b where A is the given matrix and b is the set of recommendations from the participants. The benefit of matrix factorization is that it can be applied to sparse matrices where not all users rate all movies and do so only selectively. Matrix factorization is a technique for solving a linear system of equations.

Given a user rating of movies in a matrix, it can be decomposed into a dot product of a user embedding matrix and a movie embedding matrix. Embedding refers to latent features corresponding to that entity. The user latent features and movie latent features can then be looked up corresponding to a movie-user rating. The embeddings matrix also serves as input to other neural net layers some of which can involve non-linear layers and introduce more refinements to predictions. Other matrix factorization techniques to utilize the initial matrix form include principal component analysis and singular value decomposition. The former brings out the salient contributors in the ratings while the latter brings out the salient preferences of a user such as 50% comedy fan + 50% romance fan where that user is only sensitive to these latent factors.

When the model is trained on these ratings by users on movies, it can be used to predict the rating of a new movie by finding how similar it is to the other movies based on the user preferences.

What are the different types of Matrix Factorization?

Matrix factorizations differ in the form of factorizations such as lower triangular- upper triangular or matrix and its transpose and such others. Matrix factorizations also differ in the method in which they are factorized. There is one form that performs better for a variety of tasks. This is the Cholesky factorization which has advantages such as

Simpler solving
Fewer iterations
The transformation from uncorrelated to correlated normal variables.
more efficient than LU and QR decomposition
Best for positive definite matrices
Always a good bet for sparse data

How is QR decomposition different from LU decomposition?

QR decomposition is used to solve the Least Squares problem which solves a system of linear equations denoted by Xb = Y where the inverse of matrix X cannot be calculated and instead an approximation is taken such that sum of squared deviations is minimized. QR form is one where R is the upper triangular matrix. LU decomposition results in a lower triangular matrix and an upper triangular matrix. Gaussian elimination is a popular form of LU decomposition. It involves adding multiples of one row to the other, swapping two rows, and scaling a row to result in the lower and upper triangular matrices.

Monday, April 19, 2021

Quantum computing:

Introduction: Certain computation tasks can be executed exponentially faster on quantum computers than on classical computers. Noted Physicist Richard Feynman suggested that it was exponentially costly to simulate quantum systems with classical computers. He envisioned a few benchmarks. First, the computational space must be large, and the error rate must be small such that a quantum speedup can be observed. Second, a problem that is hard for classical computers must be easy for quantum computers. Such questions that frame a benchmark can be executed on a superconducting qubit processor. Algorithms that take full advantage of quantum computing are still a nascent field.

In quantum computing, the basic unit of quantum information is a qubit. This is a two-state quantum mechanical system. In classical systems a bit must be 0 or 1. Quantum mechanics allows the qubit to be in a coherent superposition of both states simultaneously which allows a qubit to hold a lot more information than just two discrete states. This property is fundamental to quantum mechanics and quantum computing. For example, the polarization of a single photon can have both vertical polarization as well as horizontal polarization. The spin of an electron can be taken as spin up and spin down. The first step in the quantum data storage was demonstrated with the coherent transfer of a superposition state in an electron spin “processing” qubit to a nuclear spin “memory” qubit. The transmission protocol for two classical bits of information such as 00, 01, 10, or 11 can be transmitted by sending only one qubit from a sender say Alice to a receiver, say Bob. This is called superdense coding. It requires Alice to perform one of four quantum gate operations to prearrange the measurement Bob makes. Next, Bob can decode the information being sent by receiving Alice’s qubit, decoding it, operating on the pair, and measuring both. Since both qubits are required to decode the information, it eliminates the risk of spoofing, tampering, repudiation, information disclosure, denial of service, and elevation of privilege which is also referred to as the STRIDE model of threat analysis. An eavesdropper, Eve, cannot figure out Alices’ qubit or Bob’s qubit and any such attempt would alert both Alice and Bob.

A system of n-components requires only n bits in classical physics whereas, in quantum physics, it requires Math.pow(2, n) complex numbers and is referred to as the Hilbert space. The quantum state of a qubit can be represented as a linear superposition of its basis vectors usually called ket0 and ket1 vectors. This transforms the original information on n components into the high-dimensional linear vector Hilbert space.

Representing vectors and matrices and using factorization for matrices is well-established in classical systems and these can be repurposed in the Hilbert space. Newer algorithms for factoring a number like Shor’s algorithm are faster than their predecessors but realizing the full potential of quantum computing for Shor’s algorithm still requires certain obstacles to be overcome with fault-tolerant logical qubits.

Sunday, April 18, 2021

The design of reward point service push notifications:

Introductions: Reward points are a form of currency. If we have an account, the collection of reward points accumulated is a thrill to watch on any mobile device as appreciations come from individuals and businesses for various monitored activities. This article explores the design of a notification service for sending notifications on reward points to millions of users.

Description: we discussed three considerations to the update of reward points which include the following: 1. data model, 2. push ranking and 3. push release. The data model is simple for this reward points service. It consists of a reward point information data structure that has additional attributes such as the last updated timestamp, the last few activities that generated the update and the redeemable options if any. We focus on just the balance, the timestamp and activities for the push object. The activities are ranked not just on chronological activity but also on criteria set for ranking based on weighted features of the reward points. The intimacy of the users who send the reward points can also be encoded in the central repository so that it can be weighed into the ranking for the activities that need to be sent. We could simply the update for the notifications if we excluded the activities because there can be millions of users and we would not be selective both on the push side from the server and on the pull side from the mobile devices. Instead, we can consider the updates to be tackled independently for the account balance and for the activities that generate the updates. When there are only a few hundreds of users, the release may be simpler but when there are millions of users, the updates must be economical and must be handled with selectivity. The account balance is merely a data artifact, but its synchronization can be represented in the control plane with the help of a resource. This resource can get out of sync as the data is updated but the resource representation in the control plane allows it to be synchronized in the target device. The activity can be triggered both by the mobile device that wants to check in as well as the server that wants to push the notifications to the user. The synchronization of resources in control plane is well established with orchestration frameworks that take away the overhead of writing synchronization service. Instead, all data activities can be monitored to trigger control plane notifications. The operational engineering aspects of the reward points services can be maintained by this state orchestration framework if we have only one resource to be decided. The clients can choose to check-in at selective times and the server can be selective about which clients to update. When the push objects consist of activities, there are options to reduce the number of connections to the central repository by denormalizing the data even if there is some redundancy. The selection of activities can be determined by improvements to the ranking algorithm. Core indicators such as grantor stickiness, retention rate and advertising revenue can determine whether the ranking algorithms is behaving correctly. The resource updates can be sent immediately to the device which reduces the impact of the ranking algorithm. Similarly, the resource updates can be pulled only when the device wakes up. This optimizes the write update because the data does not need to be sent out. Both methods work well in certain situations. The process of sending data to all devices is a broadcast operation. This can happen in both writing as well as loading. It can be made selective as well. The broadcast can be limited during both pull and push. Disabling the writes to all devices can significantly reduce the cost. Other devices can load these updates only when reading. It is also helpful to keep track of which devices are active over a period so that only those devices get preference.

The control plane resource reconciliation can be done with a controller pattern which tracks at least one resource type. These objects have a spec field that represents the desired state. The actual reward point is the current state. The controller is a control loop that watches for the state of the resource and then makes, or request changes where needed. There can be several controllers depending on resource types. Each controller tries to move the current state to the desired state by reading the spec field. The controller might carry out the action itself or it will relay a request to the API server. The control plane resource object can be created, updated or deleted just like any other resource with representational state transfer api design. The controller can set additional states on the resource such as marking it as finished.

Conclusion: The Reward points services requires only a resource representation in the control plane for the synchronization to span any number of devices.

Saturday, April 17, 2021

Writing an automation to synchronize devices using a queue:

Introduction: When devices access ingress APIs directly, there is no necessity for a queue. Yet queues provicde the ability to hold the requests when background workers complete the task. The former is synchronous and the latter asynchronous. APIs can also support an interface that is asynchronous and when devices call them directly, they can do so in parallel. Devices can also register callbacks. This is a popular programming interface. But the onus is on the devices to be savvy about using the interface to synchronize their state and the server has no knowledge about the client-side processing other than honoring their requests. Queues formalize the definition for the asynchronous behavior in the system. Each request is encapsulated in a message that can be journaled and audited. It supports retries and requeuing that gives an opportunity to multiple stages of workflows to complete. This article describes the usage of the latter whenever it is proper.

Description. We will need a database, a message broker, and a parallel-task library for the purposes of automating the synchronization requests. The message broker is deployed to a cluster so that it can be highly-available by virtue of the elasticity of the nodes servicing the cluster. The queue can support millions of requests of a few hundred bytes each. The state of the devices to be reconciled is kept in a database and the state can be changed both by virtue of the processing of the requests in a queue or by administrative actions on the database. The database does not have any exposure to the devices directly other than the queue. This enables the database to be the source of truth for the device state. The message broker can have internal and external devices and the update to the state is bidirectional. When the devices wake up, they can request their state to be refreshed. This perfects the write update because the data does not need to be sent out. If the queue sends messages back to the devices, it is a fan-out process. The devices can choose to check-in at selective times and the server can be selective about which devices to update. Both methods work well in certain situations. The fan-out happens in both writing as well as loading. It can be made selective as well. The fan-out can be limited during both pull and push. Disabling the writes to all devices can significantly reduce the cost. Other devices can load these updates only when reading. It is also helpful to keep track of which devices are active over a period so that only those devices get preference.

The library that automates the translation of states to messages and back supports parallelization so that each worker can take one message or device state at a time and perform the conversion. The translation between state and message is one-to-one mapping and the workers are also assigned the ownership of the translation so that there is no overlap between the tasks executed by the workers. The conversion can happen multiple times so the workers can support multiple stage workflows independent of the devices simply by constructing internal messages for other workers to pick up. All the activities of the workers are logged with the timestamp of the message, the identity of the device for which the state is being synchronized and the identity of the worker. These logs are stored in a way that they can be indexed and searched based on these identifiers for troubleshooting purposes.

The workers can also execute web requests to target the devices directly. They have access to the message broker, the database and the devices. The background jobs that create these workers can be scheduled or periodic or in some cases polled from the queue so that a message on arrival can be associated with a worker.

This completes the system of using background workers to perform automation of device synchronization. With a one-to-one mapping between messages and workers and having several workers, it becomes easy to scale the system to handle a large number of devices.

Friday, April 16, 2021

The design of reward point service push notifications:

Description: The updates of reward points on a client can get out of sync with the central repository that accumulates for all users across different publishers. The frontend for the client is trivial if it displays only the balance of the reward points along with some additional metadata. We focus instead on the backend. There are many aspects to consider but we will focus on only a few at this time. There are three cases to consider which include the following: 1. data model, 2. push ranking and 3. push release. The data model is simple for this reward points service. It consists of a reward point information data structure that has additional attributes such as the last updated timestamp, the last few activities that generated the update and the redeemable options if any. We focus on just the balance, the timestamp and activities for the push object. The activities are ranked not just on chronological activity but also on criteria set for ranking based on weighted features of the reward points. The intimacy of the users who send the reward points can also be encoded in the central repository so that it can be weighed into the ranking for the activities that need to be sent. We could simply the update for the notifications if we excluded the activities because there can be millions of users and we would not be selective both on the push side from the server and on the pull side from the mobile devices. Instead, we can consider the updates to be tackled independently for the account balance and for the activities that generate the updates. When there are only a few hundreds of users, the release may be simpler but when there are millions of users, the updates must be economical and must be handled with selectivity. The account balance is merely a data artifact, but its synchronization can be represented in the control plane with the help of a resource. This resource can get out of sync as the data is updated but the resource representation in the control plane allows it to be synchronized in the target device. The activity can be triggered both by the mobile device that wants to check in as well as the server that wants to push the notifications to the user. The synchronization of resources in control plane is well established with orchestration frameworks that take away the overhead of writing synchronization service. Instead, all data activities can be monitored to trigger control plane notifications. The operational engineering aspects of the reward points services can be maintained by this state orchestration framework if we have only one resource to be decided. The clients can choose to check-in at selective times and the server can be selective about which clients to update. When the push objects consist of activities, there are options to reduce the number of connections to the central repository by denormalizing the data even if there is some redundancy. The selection of activities can be determined by improvements to the ranking algorithm. Core indicators such as grantor stickiness, retention rate and advertising revenue can determine whether the ranking algorithms is behaving correctly. The resource updates can be sent immediately to the device which reduces the impact of the ranking algorithm. Similarly, the resource updates can be pulled only when the device wakes up. This optimizes the write update because the data does not need to be sent out. Both methods work well in certain situations. The process of sending data to all devices is a broadcast operation. This can happen in both writing as well as loading. It can be made selective as well. The broadcast can be limited during both pull and push. Disabling the writes to all devices can significantly reduce the cost. Other devices can load these updates only when reading. It is also helpful to keep track of which devices are active over a period so that only those devices get preference.

Thursday, April 15, 2021

The refactoring of old code to new microservices.

With the introduction of microservices, it became easy to host not only a dedicated database but also a dedicated database server instance and separate the concerns for each functionality that the user interface comprised of. When we use microservices with Mesos-based clusters and shared volumes, we can even have many copies of the server for high availability and failover. This is possibly great for small and segregated data but larger companies often require massive investments in their data, often standardizing tools, processes, and workflows to better manage their data. In such cases consumers of the data don't talk to the database directly but via a service that sits behind say even a message bus. If the consumers proliferate, they end up creating and sharing many different instances of services for the same data each with its own view rather than the actual table. APIs for these services are more domain-based rather than implementing a query-friendly interface that lets you directly work with the data. As services are organized, data may get translated or massaged as it makes its way from one to another. I have seen several forms of organizing the services starting with service-oriented architecture at the enterprise level to fine-grained individual microservices. It is possible to have a bouquet of microservices that can take care of most data processing for the business requirements. Data may even be at most one or two fields of an entity along with its identifier for such services. This works very well to alleviate the onus and rigidity that comes with organization, the interactions between the components, and the various chores that need to be taken to keep it flexible to suit changing business needs. The microservices are independent so they stand by themselves as if spreading out from data for their respective functionalities. This is already business-friendly because each service can now be modified and tested independently of others.

The transition to microservices from legacy monolithic code is not straightforward. The functionalities must be separated beyond components. And in the process of doing so, we cannot risk regression. Tests become a way to scope out behavior at boundaries such as interface and class interactions. Adequate coverage of tests will guarantee backward compatibility for the system as it is refactored. The microservices are independently testable both in terms of unit tests as well as end-to-end tests. Services usually have a REST interface which makes it easy to invoke them from clients and comes with the benefits of using browser-based developer tools. The data store does not need to be divided between services. In some cases, only a data access service is required which other microservices can call. The choice and design of microservices stem from the minimal functionalities that need to be separated and articulated. If the services don’t need to be refactored at a finer level, they can remain encapsulated in a singleton.

The rule of thumb for the refactoring of the code is the follow up of the Don’t Repeat Yourself or (DRY) principle which is defined as “Every piece of knowledge must have a single, unambiguous, authoritative representation within a system”. This calls for every algorithm or logic that is cut and pasted for different usages to be consolidated at a single point of maintenance. This improves flexibility because enhancements such as the use of a new data structure can be replaced in one place and it also reduces the bugs that come by when similar changes must be made in several places. This principle also reduces the code when it is refactored especially if the old code had several duplications. It provides a way to view the minimal skeleton of the microservices when aimed at the appropriate scope and breadth. Even inter-service calls can be reduced with this principle.

Good microservices are not only easy to discover from their APIs but also easy to read from their documentations which can be autogenerated from the code with markdowns. Different tools are available for this purpose and both the approach of using microservices as well as the enhanced comments describing the APIs provide sufficient information for the documentation.

Wednesday, April 14, 2021

Applications of Data Mining to Reward points collection service

Continuation of discussion in terms of Machine Learning deployments

Machine learning algorithms are a tiny fraction of the overall code that is used to realize prediction systems in production. As noted in the paper on “Hidden Technical Debt in Machine Learning systems” by Sculley, Holt and others, the machine learning code comprises mainly of the model but all the other components such as configuration, data collection, features extraction, data verification, process management tools, machine resource management, serving infrastructure, and monitoring comprise the rest of the stack. All these components are usually hybrid stacks in nature especially when the model is hosted on-premises. Public clouds do have a pipeline and relevant automation with better management and monitoring programmability than on-premises, but it is usually easier for startups to embrace public clouds than established large companies who have significant investments in their inventory, DevOps and datacenters.

Deployments can be on Infrastructure as a service, Platform as a Service, Software as a service, clusters, containers, computer grids and computer clouds. The choice depends on convenience, cost and effectiveness. A web server can easily be hosted on a virtual machine and since the VM is hosted in the cloud, it can use the cloud features such as global availability, managed environment, load balancing, and disaster recovery. PaaS can be used where an application runtime is needed because stacks like OpenShift, Cloud Foundry, Pivotal or Azure are runtime environments. The hosting and deployment of applications is managed. Software as a service are complete applications by themselves. These can be accessed from any client including mobile devices. A machine learning model hosted on SaaS can be used remotely from any handheld device with an IP connectivity and a browser. Grids and supercomputers are specialized systems and were required for scientific computations. With newer algorithms for commodity servers in the cloud computing framework and with the creation of higher capability clusters, the ability to use advanced machine learning algorithms has been made easier.

Applications and resources can be managed efficiently with the help of system center tools. They can work across clouds and networks. For example, tools can help monitor resources on-premises as well as public cloud with the help of agents deployed to these resources. The agents collect all information necessary to provide drilldowns as well as summary at hierarchical scopes. A virtual private cloud provides isolation from other resources. This comes in helpful for development and test before deploying to production systems where the operations engineering charts and graphs can now specify the expectations around the metrics.

The following chart makes a comparison of all the data mining algorithms including the neural networks: https://1drv.ms/w/s!Ashlm-Nw-wnWxBFlhCtfFkoVDRDa?e=aVT37e

Thank you.