Cluster computing

Friday, May 4, 2018

Introduction:

We were discussing full-service options for NLP programming for any organization. I used an example to discuss points in favor of such a service: http://shrink-text.westus2.cloudapp.azure.com:8668/add. Here we try to illustrate that a file uploader for instance is an improvement over raw text processing.
There are several reasons that full service is more appealing than some out of box capabilities. For example, the connectors to data sources will need to be authored. Automation and scheduling of jobs and intakes are going to be necessary for continuous processing. Error handling, reports and notifications will be required for administration purposes anyways.
Full Service options also involve compute and storage handling for all the near and long term needs surrounding the processes for the NLP task involved. Artifacts produced as a result of the processing may need to be archived. Aged artifacts many need to be tiered. Retrieval systems can be built on top of collections made so far. At any time, a full service solution at the very least provides answers to questions that generally take a lot of effort with the boxed solutions.
Moreover, it is not just the effort involved with out of box features, it is the complete ownership of associated activities and the convenience brought into the picture. And the availability of queuing services and asynchronous processing for all backlogs adds more value to the full service. Reports and dashboards become more meaningful with full service solutions. The impact and the feedback from audience is improved with full service solution. A full service solution goes a long way to improve customer satisfaction.

#codingexercise

Tests for https://ideone.com/6ms4Vz

Thursday, May 3, 2018

Introduction:

The previous article introduced some of the essential steps in getting a small lightweight eCommerce website up and running in minimal time. We mentioned that the users for this website may need to be recognized, their transactions on the website may need to be remembered and the processing for each order may need to be transparent to the user. This works well for in-house software development across industries for a variety of applications using off the shelf products, cloud services and development framework and tools. Web applications and services fall in this category. Most software engineering development in industries such as finance, retail, telecommunication and insurance have a significant amount of domain expertise and picking the right set of tools, resources and processes is easy for the business sponsors and their implementors.

However, when the domain remains the same and we apply new computational capabilities that require significant new knowledge and expertise such as in machine learning, then it is slower to onboard and expect new team members to realize the applications. In such cases, the machine learning toolkit provider may only be able to put out samples and community news and updates. The companies are then best served by white-glove service that not only brings in the expertise but also delivers on the execution. First it reduces the time to implementation because the skills, resources and best practice are available. Second, the challenges are no-longer unknown and have been dealt with earlier in other companies. These together argue for a specialized consultancy services in machine learning development in most verticals. Even web-application development started out this way in many organizations before having indigenous employees assume all application development effort. Some organizations may want to have both - the expediency to realize near term goals and the investment to build long term capabilities.

I have a sample application : http://shrink-text.westus2.cloudapp.azure.com:8668/add to illustrate an example. Suppose we want to use this as a sample within a particular domain, then we would need to justify it over say SQL Server text classification capability. Here we need not argue that the above processing does not require text data to make its way into the SQL server for the above service to be used. Instead, we focus on the model tuning and customization we can do for the same algorithm as in SQL Server as well as model enhancement with other algorithms say from R package while allowing to operate on data in transit in both cases.

#codingexercise

Tests for https://ideone.com/6ms4Vz

Wednesday, May 2, 2018

Introduction:

This article narrates some of the essential steps in getting a small lightweight eCommerce website up and running in minimal time. We assume that the users for this website will be recognized, their transactions on the website will be remembered and the processing for each order will be transparent to the user.

The registration of the user occurs with a membership provider – this can be an ASP.Net membership provider, a third party identity provider such as login with Google or an IAM vendor that honors authentication and authorization protocols such as OAuth or SAML

Assuming a simple python Django application suffices as a middle-tier REST service API, we can rely on Django’s native support for different authentication backends such as model based authentication backend or remote user backend. To support Google user automatic recognition, we just include the markup in the user interface as

function onSignIn(googleUser) {

var profile = googleUser.getBasicProfile();

console.log('ID: ' + profile.getId()); // Do not send to your backend! Use an ID token instead.

console.log('Name: ' + profile.getName());

console.log('Image URL: ' + profile.getImageUrl());

console.log('Email: ' + profile.getEmail()); // This is null if the 'email' scope is not present.

}

The transactions on the website for a recognized user is maintained with the help of session management.

Django has native support for session management and in addition allows us to write our own middleware

The order history is maintained in the form of relevant details from the orders in the Order table. Create, update, delete of the orders are tracked from this table. Status field on the order table is progressive in the form of initialized, processing, completed and canceled. Timestamps are maintained for created as well as modified.

Sample App: http://shrink-text.westus2.cloudapp.azure.com:8668/add
#codingexercise
https://ideone.com/6ms4Vz

Tuesday, May 1, 2018

Today we discuss the AWS database migration service- DMS. This service allows consolidation , distribution, and replication of databases. The source database remains fully operational during the migration, minimizing downtime to applications that rely on the database. It support almost all the major brands of databases. It can also perform heterogeneous migration such as from Oracle to Microsoft SQL Server.
When the databases are different, the AWS schema conversion tool is used. The steps for conversion include : assessment, database schema conversion, application conversion, scripts conversion, integration with third party applications, data migration, functional testing of the entire system, performance tuning, integration and deployment, training and knowledge, documentation and version control, and post production support. The schema conversion tool assists with the first few steps until the data migration step. Database objects such as tables, views, indexes, code, user defined types, aggregates, stored procedures, functions, triggers, and packages can be moved with the SQL from the schema conversion tool. This tool also provides an assessment report and an executive summary. As long as the tool has the drivers for the source and destination databases, we can rely on the migration automation performed this way. Subsequently configuration and settings need to be specified on the target database. These settings include performance, memory, assessment report etc. The number of tables, schemas, user/role/permissions determine the duration of the migration.
The DMS differs from the schema conversion tool in that it is generally used for data migration instead of schema migration.
#codingexercises
Sierpinski triangle:
double GetCountRepeated(int n)

{

double result = 1;

For (int i = 0; i < n; i++)

{

result = 3 * result + 1 + 1;

}

Return result;

}

which can also be written recursively

another : https://ideone.com/F6QWcu

and finally: Text Summarization app: http://shrink-text.westus2.cloudapp.azure.com:8668/add

Monday, April 30, 2018

I have been working on a user interface for customers to upload content to a server for some processing. Users may even have to upload large files and both the upload and the processing may take time. I came across an interesting jquery plugin for the purpose of showing a progress bar.
The jQuery-File-Upload plugin gives an example as follows:
$(function () {
$('#fileupload').fileupload({
dataType: 'json',
done: function (e, data) {
$.each(data.result.files, function (index, file) {
$('<p/>').text(file.name).appendTo(document.body);
});
}
});
});

The upload progress bar is indicated this way:
$('#fileupload').fileupload({
:
progressall: function (e, data) {
var progress = parseInt(data.loaded / data.total * 100, 10);
$('#progress .bar').css(
'width',
progress + '%'
);
}
});

Notice that the function depends on data and this notion can be borrowed to the server side where given a request id, the server can indicate progress
function updateProgress() {
if (stopProgressCheck) return;
var webMethod = progressServiceURL + "/GetProgress";
var parameters = "{'requestId':'" + requestId + "'}";

$.ajax({
type: "POST",
url: webMethod,
data: parameters,
contentType: "application/json; charset=utf-8",
dataType: "json",
success: function (msg) {
if (msg.d != "NONE") { //add any necessary checks
//add code to update progress bar status using value in msg.d
statusTimerID = setTimeout(updateProgress, 100); //set time interval as required
}
},
error: function (x, t, m) {
alert(m);
}
});
}
Courtesy : https://stackoverflow.com/questions/24608335/jquery-progress-bar-server-side
The alternative to this seems to be to use the HTTP status code for 102 processing.
#application : https://1drv.ms/w/s!Ashlm-Nw-wnWtkN701ndJWdxfcO4

Sunday, April 29, 2018

We were discussing the benefits of managed RDS instance. Let us now talk about the cost and performance optimization in RDS.
RDS supports multiple engines including Aurora, MySQL, MariaDB, PostgreSql, Oracle, SQL Server. Being a manged service it supports provisioning, patching , scaling, replicas, backup/restore and scaling up for all these servers. It supports multiple Availability zones. Lower TCO as a managed service - it provides more focus on differentiation which makes it attractive to managed instances big or small.
The storage type may be selected between GP2 and IO1. The former is general purpose and the latter is for high consistency and performance.

Depending on the volume size the burst rate and the IOPS rate on the GP2 needs to be monitored. There is a limit imposed by GP2 and as long as we have credit on this limit, the general purpose GP2 serves well.
Compute or memory can be scaled up or down. Storage can be upto 16 GB. There is no downtime for storage scaling.
Failovers are automatic. Replication is synchronous. Availability zone is inexpensive and enabled with one click. Read Replicas relieves pressure on the source database with additional read capacity.
Backups are managed with automated and manual snapshots. Transaction logs are stored every 5 minutes. There is no penalty for backups. Snapshots can be copied across regions. A backup can restore an entirely new database instance.
New volumes can be populated from Amazon S3. A VPC allows network isolation. Resource level permission control is based on IAM access control. There is encryption at rest and SSL based protection for transit. There is no penalty for encryption. Moreover, it is centralized with access and audit of key activity.
Access grants and revokes are maintained with an IAM user for everyone including the admin. Multi-factor authentication may also be setup.
CloudWatch may provide help with monitoring. The metrics usually involve those for CPU, storage and memory, swap usage, read and write, latency and throughput and replica lags. Cloudwatch alarms are similar to on-premises monitoring tools. Performance insights can be gained additionally by measuring active sessions, identifying sources of bottlenecks with an available tool, discovering problems with log analysis and windowing of timelines.
Billing is usually in the form of GB months.
#application : https://1drv.ms/w/s!Ashlm-Nw-wnWtkN701ndJWdxfcO4

Saturday, April 28, 2018

We were discussing the benefits of managed RDS instance:

The managed database instance types offer a range of CPU and memory selections. Moreover their storage is scaleable on demand. Automated backups have a retention period of 35 days and manual snapshots are stored in S3 for durability. An availability zone is a physically distinct independent infrastructure. Multiple availability zones each of which is a physically distinct independent infrastructure comes with database synchronization so they are better prepared for failures. Read replicas help offload read traffic. Entire database may be snapshot and copied across region for greater durability. Compute, Storage and IOPS are provisioned constitute the bill.

Performance is improved with offloading read traffic to replicas, putting a cache in front of the RDS and scaling up the storage or resizing the instances. CloudWatch alerts and DB Event notifications enabled databases to be monitored.

In short, RDS allows developers to focus on app optimization with schema design, query construction, query optimization while allowing all infrastructure and maintenance to be wrapped under managed services.

RDS alone may not scale in a distributed manner. Therefore software such as ScaleBase allows creation of a distributed relational database where database instances are scaled out. Single instance database can now be transformed into multiple-instance distributed relational database. The benefits from such distributed database include massive scale, instant deployment, keeping all RDS benefits from single-instance, automatic load balancing especially with lags from replicas and splitting of reads and writes, and finally increased ROI with no app code requirements.

Does multi-model cloud database instance lose fidelity and performance over dedicated relational database ?
The answer is probably no because a cloud scales horizontally and what the database server did to manage partition is what the cloud does too. A matrix of database servers as a distributed database model comes with the co-ordination activities. A cloud database seamlessly provides a big table. Can the service-level agreement of a big table match the service-level agreement of a distributed query on a sql server ? The answer is probably yes because the partitions of data and corresponding processing are now flattened.

Are developers encouraged to use cloud databases as their conventional development database which they move to production ? This answer is also probably yes and the technology that does not require a change of habit is more likely to get adopted and all the tenets of cloud scale processing only improves traditional processing. Moreover queries are standardized in language as opposed to writing custom map-reduce logic and maintaining a library of those as a distributable package for No-Sql users.

#codingexercise https://ideone.com/Ar5cOO