Saturday, April 2, 2016

Background Tasks

Contents

Problem statement:

We have a self service file share provisioning portal which creates and maintains a file share on a remote storage clusters that are named by their location. Each cluster has a driver that we call the connector and it exposes a set of SSH commands for the said operations. An API layer translates the user portal request into a command for a target connector. The API layer is reached through a gateway. Frequently we have seen timeouts from different sources - sometimes the gateway times out, sometimes the connector takes too long, sometimes the SSH connection to a remote cluster takes too long. For now we have been working with an upper bound on the timeout. In this post, we try to find an alternative solution using an asynchronous mechanism.

Solution:

Current Interaction diagram looks like this:

In this interaction, API is on hold until the resource is created.

Instead if a promise could be made that can be fulfilled later, then the API can be relieved right away.

This introduces two challenges:

1) Earlier the object was created and its properties cannot be changed and operations cannot be made on the object until it has been created and marked active. Moreover we send emails to the customer when the share has been created.

2) Earlier the object creation was scoped within a transaction so we had a clean state to begin or end with.

We now overcome these to have an interim state before it is marked active. We call this state pending. Since each resource is identified by it location and file share name, the API can check multiple requests for the same object creation by finding the existence of a record in the database and making sure that it is not pending. If the object is pending, the API can return an error as a bad request or method not allowed. If the object is already created, then the bad request error returned with a different error message. No other operations are permitted unless the object is active. Therefore the create call is an idempotent call

In order to receive a notification, the API layer can spawn workers to separate the send and receive operations on different tasks. This lets the send operation relieve the caller immediately. Moreover the receive operation can be made a background task. On completion of this task, the user can be emailed that the share is ready. If the share creation fails, the share entry is removed from the current database so that it is in a clean state and the create operation can be retried.

This background task happens as processing of a message queue. During the send operation the database is marked and a message is placed on the message queue. The message queue processor reads the message and issues a command to the connector waiting on the response. When the connector responds back, it suitably marks the record as active or deletes the entry. Only the processor can delete the entry. It notifies the API or the portal or the user via http or email. Both the processor and the API have to check the entries in a database. There is an understanding between the API and the processor that redundant create messages may appear but there can only be one file share and one corresponding entry in the database.

The background processor or spool :

The interaction diagram now looks like this:

Choice of a background task processor versus concurrent APIs:

In the above description, a background task processor has been proposed. The same task separation could have been executed independently with concurrency features of the language. However, the background task enables separation from the caller and the recipient of the results. Moreover it can handle different types of tasks. A message queue because allows retries on failures and can scale as a cluster.

Conclusion:

APIs can become more responsive with a background task processor and a new state for the resource.

Cluster computing

Saturday, April 2, 2016

Background Tasks

Problem statement:

Conclusion:

No comments:

Post a Comment