Problem statement
An email campaign management system empowers a user to send automated emails to many recipients. The content and the broadcast of the email is referred to as a campaign. A sample use case for the campaign helps describe the problem and this solution. Let us say a job seeker wants to mail out a template with a standard cover letter and resume as a self-introduction and advertisement to all the acquaintances. In this case, the letter and the resume become part of the campaign and the user may want to change the campaign and the target audience. The ability to do so from a web interface helps make the interaction minimal and error-free. The content can be uploaded as files while the email recipients can be added from the browser. After the contents have accrued to the intended group, the candidate can click on a button to mail the recipients using SMTP.
Role of a Database:
A database is useful to keep a table of entries and to support create, update, and delete operations independent of the purpose for which these contacts are accrued. The table serves well for an online transaction processing system and the interface to use such a table follows a standard convention for the usage.
Role of a Message Broker:
A message broker is useful for sending messages to multiple recipients with retries and dead letter queue. Besides, it journals the message and activities for review later. Messaging protocols are well-known and enable scriptability with a variety of libraries and packages.
Role of a user interface:
The user interface is intended only for one user – the campaign manager. The campaign manager can not only feed the recipients and the content but also review the activity and progress as the campaign is mailed out.
Design:
A system that enables campaigns to be generated is a good candidate for automation as a background job. Let us look at this job in a bit more detail:
1) First, we need a way to specify the criteria for selecting the email recipients. This can be done with a set of logical conditions using ‘or’ and ‘and’ operators. Each condition is a rule and the rules may have some order to them. They are best expressed as a stored procedure with versioning. If the data to filter resides in a table such as say the customers' table, then the stored procedure resides as close to the data as possible.
2) The message may need to be formatted and prepared for each customer and consequently, these can be put in a queue where they are appropriately filled and made ready. A message broker comes very usefully in this regard. Preparation of emails is followed by sending them out consequently there may be separate queues for preparation and mailing out and they may be chained.
3) The number of recipients may be quite large and the mails for each of them may need to be prepared and sent out. This calls for parallelization. One way to handle this parallelization would be to have workers spawned to handle the load on the queue. Celery is a good example of such a capability, and it works well with a message broker.
4) A web interface to generate campaigns can be useful for administrators to interact with the system.
The data flow begins with the administrator defining the campaign. This consists of at the very least the following: a) the email recipients b) the mail template c) the data sources from which to populate the databases and d) the schedule in which to send out the mails.
The email recipients need not always be specified explicitly especially if they number in millions. On the other hand, the recipients may already be listed in a database somewhere. There may only be selection criteria for filtering the entire list for choosing the recipients. Such criteria are best expressed in the form of a stored procedure. The translation of the user-defined criteria into a stored procedure is not very hard. The user is given a set of constraints, logical operators, and valuable inputs and these can be joined to form predicates which are then entered as-is into the body of a stored procedure. Each time the criteria are executed through the stored procedure, the result set forms the recipients' list. When the criteria change, the stored procedure is changed, and this results in a new version. Since the criteria and stored procedure are independent of the message preparation and mailing, they can be offline to the mailing process.
The mailing process commences with the email list determined as above. The next step is the data that needs to be acquired for each template. For example, the template may correspond to the resources that the recipients may have but the list of resources may need to be pulled from another database. It would be ideal if this could also be treated as SQL queries which provide the data that a task then uses to populate the contents of the email. Since this is per email basis, it can be parallelized to a worker pool where each worker grabs an email to prepare. An email receives a recipient and content. Initially, the template is dropped on the queue with just the email recipient mentioned. The task then manages the conversion of the template to the actual email message before putting it on the queue for dispatch. The dispatcher simply mails out the prepared email with SMTP.
The task-parallel library may hide the message broker from the parallelization. Celery comes with its own message broker that also allows the status of the enqueued items to be logged. However, a fully-fledged message broker with a worker pool is preferred because it gives much more control over the queue and the messages permitted on the queue. Moreover, journaling and logging can with automation. Messages may be purged from the queue so that the automation stops on user demand.
Therefore, data flows from data sources into the emails that are then mailed out. The task that prepares the emails needs to have access to the database tables and stored procedures that determine who the recipients are and what the message is. Since they act on an individual email basis, they are scalable.
Intelligent Routines:
The ability to form groups of recipients based on classification rules is an intelligence added to the system that does away with manual entry of data. The use of classifiers from groups depends on a set of rules that can be specified independently. Each rule can be added via the user interface and mailed out to the recipients.
Monitoring the progress:
The message queue broker helps with the queue statistics where the number of orders on the queue determines the progress. When an order is complete the status of the item reflects the information. Subsequent read-only queries on the status give an indication of the progress.
Testing:
Each content and group should be verified independently. The mailing of a campaign should have a dry run before being mass-mailed to the intended recipients.
Conclusion:
Implementation of an email campaign system allows the flexibility to customize all parts of the campaign process even beyond the capabilities of off-the-shelf automation systems.
Reference: paper titled “Queues are databases” by Jim Gray
Alternatives:
Instead of a database, an issue tracking software such as Jira can also be used together with the message broker. For example https://github.com/ravibeta/PythonExamples/blob/master/seemq.py
No comments:
Post a Comment