Cluster computing

Friday, June 7, 2013

I want to discuss a technical problem I came across today. I will change the problem domain and entities so that I can present the salient points. Let us consider there is a hypothetical web service with multiple data providers. These data providers are not isolated. They may provide data from their own source and/or they might get/update the data in other providers. You need the existing data providers and more can be added at any time. Adding or deleting more data providers is seamless and does not hamper the web service operations. Data providers can fail but these don't affect the overall operation since there is redundancy in the data owned by any node and there are at least three copies. The web service, however, can go online or offline and presents a single point of failure. We would like to discuss the replacement strategy for the web service and particularly the test plan around it. For example, we would like to know whether there was any regression in any of the queries from the customer to the web service. How do we go about the testing ?
The data comes from different sources and the web service maintains state in its own database. Typically the web server and the database server are provisioned on separate virtual machines. This is so because the web server requires more cpu and the database server requires more memory and storage. However, in this case let us assume that we treat the web server and database together and that they are hosted on the same virtual machine. It is this virtual machine that we want to replace.
Initially we could treat the whole system as a black box and test the system with different workloads on the web server. Most of these can be capture and replay workloads from the previous web server. However, that is not sufficient in itself because the workloads may not detect all regression. So we look at the changes and scope them to the layers and components and then we design specific tests around them. For example, these tests could be breadth and depth oriented on the data that they operate on. They could also cover edge cases and if the queries use strings, then try lower case, upper case, different unicode characters and long strings. We may also need performance and security tests. The Service level agreement such as the SLA could include metrics for performance and availability. Since the system has multiple points of failure, availability could be interpreted as a cumulative of the quorum of available resources. This could be a weighted mean since the web server is also involved. However, addition and deletion of data providers does not affect availability unless it falls short of the minimum needed.
Changes to the system could also span layers in which case we may have to test as isolated as well as end to end. For example, we can test one layer by checking against the data from the lower layers. In addition, we can test end to end to include different data providers.

Cluster computing

Friday, June 7, 2013

No comments:

Post a Comment