Background Tasks
Contents
Problem statement:
We have a self service file share provisioning portal which
creates and maintains a file share on a remote storage clusters that are named
by their location. Each cluster has a driver that we call the connector and it
exposes a set of SSH commands for the said operations. An API layer translates
the user portal request into a command for a target connector. The API layer is
reached through a gateway. Frequently we have seen timeouts from different
sources - sometimes the gateway times
out, sometimes the connector takes too long, sometimes the SSH connection to a
remote cluster takes too long. For now we have been working with an upper bound
on the timeout. In this post, we try to find an alternative solution using an
asynchronous mechanism.
Current Interaction diagram looks like this:
In this interaction, API is on hold until the resource is
created.
Instead if a promise could be made that can be fulfilled
later, then the API can be relieved right away.
This introduces two challenges:
1) Earlier
the object was created and its properties cannot be changed and operations
cannot be made on the object until it has been created and marked active.
Moreover we send emails to the customer when the share has been created.
2) Earlier
the object creation was scoped within a transaction so we had a clean state to
begin or end with.
We now overcome these to have an interim state before it is
marked active. We call this state pending. Since each resource is identified by
it location and file share name, the API can check multiple requests for the
same object creation by finding the existence of a record in the database and
making sure that it is not pending. If the object is pending, the API can
return an error as a bad request or method not allowed. If the object is already
created, then the bad request error returned with a different error message. No
other operations are permitted unless the object is active. Therefore the
create call is an idempotent call
In order to receive a notification, the API layer can spawn
workers to separate the send and receive operations on different tasks. This
lets the send operation relieve the caller immediately. Moreover the receive
operation can be made a background task. On completion of this task, the user
can be emailed that the share is ready. If the share creation fails, the share
entry is removed from the current database so that it is in a clean state and
the create operation can be retried.
This background task happens as processing of a message
queue. During the send operation the database is marked and a message is placed
on the message queue. The message queue processor reads the message and issues
a command to the connector waiting on the response. When the connector responds
back, it suitably marks the record as active or deletes the entry. Only the
processor can delete the entry. It notifies the API or the portal or the user
via http or email. Both the processor and the API have to check the entries in
a database. There is an understanding between the API and the processor that
redundant create messages may appear but there can only be one file share and
one corresponding entry in the database.
The background processor or spool :
|
The interaction diagram now looks like this:
Choice of a background task processor versus concurrent
APIs:
In the above description, a background task processor has
been proposed. The same task separation could have been executed independently with
concurrency features of the language. However, the background task enables
separation from the caller and the recipient of the results. Moreover it can
handle different types of tasks. A message queue because allows retries on
failures and can scale as a cluster.
No comments:
Post a Comment