Cluster computing

Saturday, January 30, 2021

Writing a sequential model using TensorFlow.js:

Introduction: TensorFlow is a machine learning framework for JavaScript applications. It helps us build models that can be directly used in the browser or in the node.js server. We use this framework for building an application that can recognize drawings of different types using a sequential model

Description: The model chosen is a Recurrent Neural Network model. This is used for finding groups via paths in sequences. A Sequence Clustering algorithm is like a clustering algorithm mentioned above but instead of finding groups based on similar attributes, it finds groups based on similar paths in a sequence. A sequence is a series of events. For example, a series of web clicks by a user is a sequence. It can be also be compared to the IDs of any sortable data maintained in a separate table. Usually, there is support for a sequence column. Support is a metric based on probabilities. The sequence data has a nested table that contains a sequence ID which can be any sortable data type.

The JavaScript application loads the model before using it for prediction. When enough training data images have been processed, the model learns the characteristics of the drawings which results in their labels. Then as it runs through the test data set, it can predict the label of the drawing using the model. TensorFlow has a library called Keras which can help author the model and deploy it to an environment such as Colab where the model can be trained on a GPU. Once the training is done, the model can be loaded and run anywhere else including a browser. The power of TensorFlow is in its ability to load the model and make predictions in the browser itself.

The labeling of drawings starts with a sample of say a hundred classes. The data for each class is available on Google Cloud as numpy arrays with several images numbering say N, for that class. The dataset is pre-processed for training where it is converted to batches and outputs the probabilities.

As with any ML learning example, the data is split into 70% training set and 30% test set. There is no order to the data and the split is taken over a random set.

TensorFlow makes it easy to construct this model using the Keras API. It can only present the output after the model is trained. In this case, the model must be run after the training data has labels assigned. This might be done by hand. The model works better with fewer parameters. It might contain 3 convolutional layers and 2 dense layers. Keras.Sequential() instantiates the model. The pooling size is specified for each of the convolutional layers and they are stacked up on the model. The model is trained using the tf.train.AdamOptimizer() and compiled with a loss function, optimizer just created, and a metric such as top k in terms of categorical accuracy. The summary of the model can be printed for viewing the model. With a set of epochs and batches, the model can be trained.

With the model and training/test sets defined, it is now as easy to evaluate the model and run the inference. The model can also be saved and restored. It is executed faster when there is GPU added to the computing.

When the model is trained, it can be done in batches of predefined size. The number of passes of the entire training dataset called epochs can also be set up front. A batch size of 256 and the number of steps as 5 could be used. These are called model tuning parameters. Every model has a speed, Mean Average Precision and output. The higher the precision, the lower the speed. It is helpful to visualize the training with the help of a high chart that updates the chart with the loss after each epoch. Usually there will be a downward trend in the loss which is referred to as the model is converging.

When the model is trained, it might take a lot of time say about 4 hours. When the test data has been evaluated, the model’s efficiency can be predicted using precision and recall, terms that are used to refer to positive inferences by the model and those that were indeed positive within those inferences.

Conclusion: Tensorflow.js is becoming a standard for implementing machine learning models. Its usage is simple, but the choice of model and the preparation of data takes significantly more time than setting it up, evaluating, and using it.

Similar article: https://1drv.ms/w/s!Ashlm-Nw-wnWxRyK0mra9TtAhEhU?e=TOdNXy

Friday, January 29, 2021

Writing a regressor using TensorFlow.js:

Introduction: TensorFlow is a machine learning framework for JavaScript applications. It helps us build models that can be directly used in the browser or in the node.js server. We use this framework for building an application that can detect objects in images using a regressor rather than a classifier.

Description: A classifier groups entries based on similarity to each other. Images can also be compared to one another. However, it has no relevance to the position of an object in an image. A regressor uses a bounding box that spans the image with varying sizes until it finds a portion of the image that matches an object. The object itself can be specified as a bounding box within an image. The data for training as well as test are images where the training data set has bounding box and label while the test data set does not.

The JavaScript application uses labels from the images to train the model. When enough training data images have been processed, the model learns the characteristics of the object detected. Then as it runs through the test data set, it can predict the bounding box and the label if a similar object is determined in the test data image.

As with any ML learning example, the data is split into 70% training set and 30% test set. There is no order to the data and the split is taken over a random set.

The model chosen is an object detection model. This model specifies the bounding box as top left and bottom right co-ordinates using horizontal and vertical offset notations. The size of the image is known before hand in terms of width and length and the bounding boxes are guaranteed to be within the image. The object, filename and type of file as image can be optionally specified to each image so that they can be looked up in a collection. The output consists of a label and a bounding box. A label map file is used to specify the objects to be detected and, in this case, there is only one object specified.

TensorFlow makes it easy to construct this model using an API. It can only present the output after the model is trained. In this case, the model must be run after the training data has labels assigned. This might be done by hand. The API expects data to be converted into a sequence of binary records also called TFRecord which is a simple format for storing a sequence of binary records.

When the model is trained, it can be done in batches of predefined size. The number of passes of the entire training dataset called epochs can also be set up front. A batch size of 90 and the number of steps as 7000 could be used. These are called model tuning parameters. Every model has a speed, Mean Average Precision and output. The higher the precision, the lower the speed. It is helpful to visualize the training with the help of a high chart that updates the chart with the loss after each epoch. Usually there will be a downward trend in the loss which is referred to as the model is converging.

Thursday, January 28, 2021

The API layer for use by mobile applications...

This is a continuation from the previous post:

Applications that allow user defined scripts to be run along with a rich suite of scriptable objects and APIs arguably improve automation. For over two decades with varying technologies, Microsoft has shown an example of improving script ability and automations. First there was Component Object Model, then it was Visual Basic that could use those objects and finally, there was Powershell. Many applications, services and instances can allow extensibility via user defined scripts. These can be invoked from command line as well as from other scripts. It also helps with testing.
Lastly, the importance of customer feedback cannot be understated in customer facing clients such as mobile applications and designer interfaces. Usability engineering can make it more convenient to navigate pages, use controls and view dashboards but the customers dictate the workflow. The prioritization of software features via customer feedback rests primarily with the product management team not the engineering team.

The set of considerations mentioned so far have been technological. The discussion that follows adds some more based on the business domain. Decades of efforts in streamlining data access layer across business domains have given rise to expertise and maturity in web architectures. Yet businesses continue to develop home grown stacks for their respective business applications, some under the requirement for adoption of container orchestration technologies and others with the excuse of developing independently testable microservices. Companies such as finance are heavily invested in shared data and indexes. They have a need for extract-transform-load as core part of their services. Retail companies such as for clothes or beverage are increasingly invested in point-of-sales experience. Telecommunication companies are required by law to meet compliance both for subscriber records as well as anti-trust regulatory compliance. Due to their different requirements, these organizations come up with a portfolio of solutions and then retrofit standardization and consistency across their applications and specially when technical debt mitigation is permitted. All the mentions made so far have proven useful across industries regardless of their choices of a technology such as what authentication to support. Therefore, customization and business domain requirements should not be allowed to circumvent or ignore the mentions made here.

Certain improvements are driven by their business priority and severity. Even if the technology and architecture is determined up front, their implementations may take shape differently from the others. Even the end state of the implementation at the time of release may not be at par with what was on the whiteboard once, but the techniques suggested here have stood the test of time and landscape. Perhaps the single most important contributor for these has been the popularity of their usages with developer community. For example, this is demonstrated by the adoption of GraphQL/REST over SOAP and containers over virtual machines. Developers also find the public cloud architecture convenient for the v1 products of many companies in several sectors and verticals. Developer community forums have also been a significant source of information for this document.

The following section is a note about testing. Mobile applications are written with the help of simulators that have internet access. It is common for the developer to test the user interface as and when it is being developed. Automations for the mobile applications are rather cumbersome. One of the ways to mitigate this difficulty has been to separate the front-end testing with a mobile view that can be called from desktop browsers. Usually, the web interface displayed from a backend is in response to a web request that can take additional query parameters to request a response for the mobile platform. Then there is browser driven user interface automation that can even work headless and drive through the workflows using the web controls. A combination of such approach can thoroughly test the mobile application from a users' point of view.

Lastly, standard practice for web applications via enterprise application blocks or cloud computing documentations is a great place for reference and resources but we must take note that the spirit with which they are written, suggests that our implementations can be lean and mean so that we may incur less total cost of ownership in development and operations.

Wednesday, January 27, 2021

The API layer for use by mobile applications...

This is a continuation from the previous post:

The entire software consuming the dependencies regardless of its organization may also be considered a service with its own APIs. Writing command line convenience tools to drive this software with an api or a sequence of apis can become very useful for diagnostics and automation. Consider the possibility of writing automations with scripts that are not restricted to writing and testing code. Automation has its own benefits but not having to resort to code widens the audience. Along the lines of automation, we mentioned convenience, but we could also site security and policy enforcement. Since the tools are designed to run independently and with very little encumbrance, they can participate in workflows previous unimagined with existing codebase and processes.
API versioning is mentioned as a best practice for clients and consumers to upgrade. And there is one very useful property of the HTTPS protocol where a request can be redirected so that the users may be kept informed at the very least or their calls translated to newer versions in full-service solutions. It is true however that versioning is probably the only way for providing an upgrade path to users. In order that we don't take on the onus of backward compatibility, the choice to offer both old and new becomes clearer. It is also true that not all customers or consumers can move quickly to adopt new APIs and the web service then takes the onus of maintaining the earlier behavior in some fashion. As web services mature and they become increasingly hard and unwieldy, they do not have any other way to maintain all the earlier behavior without at least relieving the older code base from newer features and providing them in a new offering.

As services morph from old to new they may also change their dependencies both in the version and in the kind of service. While user interface may become part of the responses as the response body, it may be better to separate the UI as a layer from the responses so that the service behind the API may be consolidated while the UI may be customized on different clients. This keeps customization logic away from the services while enabling the clients to vary on plugins, browsers, applications and devices. Some request responses may continue to keep their UI because they may not accept any other form of data transfer but the XMLHttpRequest that transfers the data from the browser to the server is still a client-based technology and doesn't need to be part of the server response body. Another reason why servers choose to keep their responses to include user facing forms and controls is that they want to enforce same origin and strict client registrations and their redirections. By requiring that some of their APIs to be internal, they also restrict others from making similar calls. APIs do have a natural address, binding and contract properties that allow their endpoints to be secured but client technologies on the other hand do not require such hard and fast rules. Moreover, with services relaying data via calls, strict origin registration and control may still be feasible.
There is more freedom with visualizations displayed by clients that are built for desktop rather than for mobile applications. This allows curated libraries for charts and graphs, visualizations, analytics and time-series reporting. While some may be client libraries, others may be dedicated stacks by themselves. If the treatment to data prior to rendering requires dedicated stack, it can even be written as a separate microservice specific to desktop clients. Many off-the-shelf products are built to facilitate solution integration via services which ties in well to empower desktop clients that are not resource starved.

Tuesday, January 26, 2021

The API layer for use by mobile applications...

This is a continuation from the previous post:

It is easier to add workflows to existing software by reading data and transforming it for analysis and reporting given that the online processing needs to be as efficient and streamlined as possible, so it remains available to the customers. And finally, the organization may find it easier make value propositions with vertical silos instead of horizontal modifications. While databases used to form the shared data infrastructure and services were differentiated for the value propositions, today organizations prefer to move from infrastructure view to microservice model with the introduction of new services and the retiring of old services.

Regardless of the intention, scale, scope and impact of the changes to the code base, most improvements suffer from the malaise that there are not enough tests and that it is too expensive and even prohibitive to have the adequate surface coverage for acceptance. It is widely believed that most of the tests are investments for feature launches as well as customer satisfaction.
Tests are just as much currency as any other artifact. The presence of a well written test can enforce expectations from the software as early as design time and all the way through the software development cycle. In this regard, one of the improvements we could consider is to increase API based testing for the service. Generally, an n-tier service has tests at all levels starting from the database unit tests at the back end, api level tests at the service layer, and front-end tests for the front end. Out of these, the design of the APIs and their tests poses the maximum benefit in the tradeoffs for value of tests and the technical debts. At the same time all non-functional aspects such as security and performance can still be covered. Yet APIs are hard to keep tidy in the face of ever-increasing improvements and business changes to the software.
A web application will rely heavily on backend services that may work exclusively for specific purposes such as phone contact information lookup, address information lookup, authentication and auditing, utilities such as issuing tokens for logged in users and many others. Each of these services may also be hosted in different environments that are used to prepare and test the code prior to its release to production. Consequently, these services could very well be counted as a dependency for the software. In order that we keep track of the dependencies and for troubleshooting issues with the software, we could consider command line tools that enable Application Programming Interface (API) calls to the service alone. These tools help use the service in isolation while also provide a command line convenience to gain information on individual resources such as an account or a number lookup. While curl – a popular command line tool can be used to call services via APIs hosted over the HTTPS, most enterprise services are secured and some of these APIs often require a pre-amble where one API called before another. These tools can come in helpful for this purpose. These tools come in helpful for triage and to work in isolation from the tests and code. Moreover, these tools are for the convenience of getting data or setting state and not for operations or monitoring which generally have their own infrastructure based on impact radius.

Monday, January 25, 2021

The API Layer for use by mobile applications.

This is a continuation from the previous post:

· Organization also comes with access routines and controls. Row level security and Role based access control continue to dominate this purpose. Label based controls and organization spanning resource tree access security are increasingly being taken on at the cloud level, so we have less to content within the business implementations.

· Identity and access management including credentials, keys, ciphers, messages and rotations are increasingly tackled by dedicated and almost ubiquitous platforms and products. Neither the membership provider nor the mechanism or protocol is specific to the business.

· Plugins are easy ways to bring in functionality into a platform. They also allow isolation and working with the applications, but they need to conform to the plugin architecture. This is like separation of concerns, but it is tighter with consistency requirements. When the customizations by way of plugins have been around for a while, the platform can consolidate some of the functionalities and offer them as usable by the plugins.

· The way the mobile applications evolve depends on the backend services just as much as the business requirements from the point of sales. Therefore, the mobile applications merely act as a facilitator.

· APIs like most customer facing software interactions suffer from the same liabilities over time. Fortunately, the versioning and retirement policies can be enforced for all clients with the added advantage that more and more clients accept HTTPS based APIs over other forms of interface. As devices, platforms and ecosystems change, the APIs present an evergreen investment that can stand the test of time better than others. Moreover, they naturally lend themselves to automation, diagnostics and monitoring than most other software.

· As software matures, many engineers choose to be dainty about the improvements they make often requiring very little change to existing data paths or processes. They do this for several reasons. First, because code reuse and original design decisions work favorably. Second altering existing code paths often runs the risk of regression. Third the cost of churn generally rises higher than the benefits. Also, much of the changes are usually in the form of technical debt – a term used to refer to the corrections needed as features are released by taking shortcuts which create instant debt.