Cluster computing: Building a file watcher service

Tuesday, May 7, 2013

Building a file watcher service

Why is file watcher service a bad idea ?
Many applications require the use of a file watcher. Files can be dropped and they are picked up almost instantaneously for processing and queued for completion. There are several advantages to this method. First, files are visible in the explorer so that you don't need any tools to know what the requested item of work was. Second, the files can be arbitrarily large and they can hold a variety of datatypes both structured or semi-structured. Third, the file processing is asynchronous and there are no dependencies or blocking between the producer and consumer. Fourth, its simplicity and direct reliance on basic everyday common file operations makes it very popular. Fifth, the bulk of the processing that requires delayed or heavy background processing can work with a copy or the original file without any contention or dependency on anybody. Lastly, the system can scale because the tasks are partitioned on data.
Then what could go wrong. First, file locks are notorious for the dreaded "this file cannot be moved because another program is using it" error message. Second, the software that works on different file types may come with its own limitations suh as max file size, file conversion or translation and file handling. Third the file handling is a native operating system methods and vulnerable to different kinds of exceptions and errors. In fact, the scheduler/task handling the file operation may have to deal with difficult error handling and exceptions that requires retries and user intervention.
So what could be a good replacement. One alternative is to use a database in place of the file store and let the database handle the binary or blob storage as columns or FILESTREAM. This comes with all the benefits of keeping the data all together or portable. Another approach is to use a messaging queue such as MSMQ that has robust features for error handling and recovery such as retries and dispatch. A third approach is to use services such as WCF that translate requests to messages and allow the transport to handle reliability and robustness. In fact such services can scale well in a SOA model.

Cluster computing

Tuesday, May 7, 2013

Building a file watcher service

No comments:

Post a Comment