Cluster computing

Monday, October 7, 2013

Combining user accounts with client registration pages for an API implementation web site integrates the developer and the user. Development requires an account to test the user page and registration combines the step for the two. Besides both are the company's assets. A client registration page has details such as an account for registering, a display name which is the name that others will see, an e-mail where the validations are sent to. Validations are required to confirm that the data entry is not automated. Other details include the password for the account and password security requirements enforcement, the details for the name of the application, the web site, what the application will do. The application description is important because it informs what the application intends to do. This we use to validate the client applications when it is being approved.
Also a callback URL that is registered. This callback URL is important for all OAuth client validations and hence this is critical to the OAuth logins from the client. The OAuth spec demands that the client redirect URLs be validated so this is important.

Sunday, October 6, 2013

I'm listing the various data mining algorithms.
These are as follows:
Classification algorithms : A classification algorithm predicts one or more discrete variables, based on the other attributes in the dataset.
A regression algorithm predicts one or more continuous variable such as profit or loss, based on other attributes in the dataset.
A segmentation algorithm divides data into groups or clusters or items that have similar properties.
Association algorithms finds correlations between different attributes in a dataset. This is used for creating association rules.
Sequence analysis algorithms summarize frequent sequences or episodes in data
An algorithm can create a mining model that comprises of
a set of clusters that describe groupings within the data set
a decision tree that predicts an outcome
a mathematical model that forecasts sales
a set of rules that describe how items are grouped together.
Algorithms can be picked based on the purpose at hand.
Decision tree algorithm can be used to predict a discrete or continuous attribute. It can also be used to find groups of common items in transactions.
A Naive Bayes algorithm works best to predict a discrete attribute. A neural network algorithm could be used too. A clustering algorithm also works well to predict a discrete attribute. However, it is better suited for grouping of similar items. A sequence clustering algorithm can be used to find groups of similar items as well as to predict a sequence.
A time series algorithm and a linear regression algorithm works best to determine a continuous attribute.
Association algorithm works well to find groups of common items by establishing correlations between attributes.

Token Ring with requests

In the previous post, we described a token ring. The token continues to circulate whether there is demand for the shared resource or not. If the participants in the token ring respond only to requests, then there is more efficiency. The messages will now be for token and for the request. Tokens circulate in one direction and the requests circulate in the opposite direction.
A more general token based approach uses a spanning tree to pass the token. The advantage with the spanning tree is that it can be applied to any graph.
It works this way:
1) Every process knows where the token is
2) requests are sent towards the token
3) The tokens travel along the same path as requests, but the opposite direction
The process know where the token is by going to the root of the tree. The tree is directed and the token is always at the root.
Process sends only one request and may need to keep a list of pending requests from its children. The key question here is how to make sure the that the token is getting closer with each forward. For this we maintain a metric or invariant so that each forward reduces the metric say timeline. we could rely on the costs we calculated between neighbors in the minimum spanning tree. The tokens and the requests follow the edges of the spanning tree. Therefore, if we were to use the entire graph instead of the minimum spanning tree, we could find more optimal path. However, for that too we consider the graph as acyclic. We include the concept of an invariant and point the edges towards the token.The tree is generalized to a partial order.
The token tree examples above remind of the dining philosophers problem.
This problem is stated this way. Five philosophers are sitting around a table. Between each philosopher is a single fork and in order to eat a philosopher must hold both forks.
The philosophers transition between three states - thinking, hungry and eating. The neighbors do not eat at the same time. This guarantees safety. And when every philosopher eats, this guarantees progress.
One solution will be to require permission to eat from all of one's neighbors. However for this solution, deadlock is possible because each process could wait for the next one in cycle.
The solution can be improved with grabbing all forks at once and releasing them after a finite time. However, even in this solution some philosophers could starve.
A better solution could be to break the symmetry. Here one of the philosophers grabs the left fork while all others try to grab the right fork,. This way the symmetry can be broken and some of the philosophers can eat. while others eat in subsequent cycles thus guaranteeing progress.
If we generalize to an undirected graph, we can break the symmetry by giving edges a direction. When we do this, we must be careful not to introduce a cycle otherwise we introduce a symmetry.
We can add a structure on the graph such that it is acyclic. All the edges point along the same way typically up and thus we draw a partial order where the edges represent priority. Conflicts for a shared resource are resolved based on this priority. After a process wins and eats, its priority is reduced so that the neigbors get a chance to eat.

Saturday, October 5, 2013

Token based solutions to Mutual Exclusion Layer in distributed systems

When a collection of processes share the same common resource, they may require mutual exclusive access to this shared resource. The problem is to design a program that acts as a "mutual exclusive layer" that resolves the conflict between the processes.
A trivial solution could be to write a single central program that maintains a queue of outstanding requests. Requests are granted one at a time. This guarantees safety and progress.
However, the centralized program could be a bottleneck.
So let us step back and consider a more general problem where mutual exclusion is an example. This is the problem of maintaining a distributed variable value.
Let us say the variable x needs to be updated. Different processes keep a local copy and manage it with broadcasts.
Each process keeps as part of its state the following:
copy of x,
the logical clock,
queue of modifying requests ( with their logical time stamps )
list of known times, one for each other process.
The process executes a request when the request has the minimum logical time of all the requests and all the known requests are later than that time.
The processes are connected in a well defined fixed topology that is the processes are finite and well-connected. We will consider the processes as arranged in a ring.
Now lets consider the use of a single indivisible token. A token is a very useful concept. It can neither be created nor destroyed.
A process is allowed to enter the critical section only if it holds the token. Every process that tries to enter a critical section also gets a token,
In a token ring the token moves constantly in a clockwise direction.
If the processes want to enter the critical section, it simply waits for the token.
The algorithm to use is therefore
To use resource:
hungry = true;
when token arrives:
if hungry
use resource
send token on clockwise direction

Courtesy : Introduction to Distributed Systems
book by Dr, Paul Sivilotti

I had a chance to review Urban airship developer guide. They have Push APIs that lets us select the audience, define the notification payload, specify the device types, deliver the notification.
Authentication is done based on application key and secret. The application secret is restricted to certain low-security APIs JSON format is supported. The base URL is go.urbanairship.com Push response has push_ids, the ids are GUIDS Devices have their pushTokens. Audience selectors include device IDs, segments, location, logical operators etc and is consolidated into a single attribute. The audience selector helps in cases such as broadcast. Scheduling is done with a separate endpoint /api/schedules.
A push object describes everything about the push, including the audience and push payload. A push payload is composed of upto five attributes. audience, notification, device_types, options and message.
Device information APIs are based off of device_token and includes information such as last registration, badge, quite time etc.
Devices don't necessarily have to be a specific platform. It can be iOS, Android etc.
The location API for urban airship is interesting. They search for a location by name but results can also be filtered by boundary type such as city, province or country. The location can be searched based on latitude and longitude as well. Specifying the boundary type with latitude and longitude such as city, postal code etc is recommended.
Device registration is done with device tokens for iOS, APIDs for Androids and Blackberry PIN
The reports API gives the number of billable users for the month, broken out by iOS and Android.
Tag API can be used for creating, deleting or updating tags for an application.

Friday, October 4, 2013

I want to mention how dependencies can be tested with a test plan. suppose we have dependenies with external web services, that the api implementation we provide depends on and the dependency communicates directly with the client and the we get notified by the client. In such a case, we have a circular flow of data where the flow is unidirectional.There is no way to flow the data in the reverse direction to see if the path is the same and that all participants are working correctly. There may be latency in the data to come back once we have passed it to the dependency. During this latency, there is no way to guarantee that the data made it back to the client. Therefore it's important to enable a mechanism to detect and diagnose issues.
Since we are talking about only three participants, one way to do add this diagnosability would be to add duplex communication between the API server, the client and the dependency.
The API server need to check with both the dependency as well as the client.

When making a call to the downstream dependency, the API needs to check if the response was forwarded. With whatever capability the dependency provides, we should look forward to a suitable check. Suppose the dependency provides only CRUD functionality for an object. Then we can use the update to see if the object was created.
Similarly, the API server can check the client. If the client requests to register an information, we can look into the information to see if it originated from our server. This is important because this way we know that the data has completed the round trip.
The server should maintain a database for the domain objects only because it could compare with what was provided to it and what it sent over.

I had a chance to review the passbook programming guide from Apple and I will mention the salient features of the protocol here. The APIs provide allow for registering a device to receive push notifications for a pass, getting serial numbers for passes associated with a device, getting the latest version of a pass, unregistering a device and for logging errors. These are used by Passbook to communicate with the web server of the company that generates the passes. The application that creates the passes may also be made available for download by the company