Cluster computing

Today we continue discussing the AWS Billing.
I referred to CloudWatch monitoring and SNS notifications as one way to populate a historical table of usage metrics. But there are some disadvantages to this
1) It requires elaborate setup and incurs further costs
2) As a cloud provider, this requires a lot of overhead for many hundreds of customers
3) Most of the usage is still under the free tier where the benefits are not that much.
On the other hand we can poll for the data that we are most interested in such as the number of instances and their image types. Furthermore, we can switch on CloudWatch for only those accounts where the usage is significantly higher than the free tier.
Therefore we wrote some sample scripts to illustrate :
1) how to get the instance information
2) how to poll for the information using cron jobs
3) how the datapoints from the usages look like
4) how to design a table to store such data
5) how to perform aggregation over this data

These scripts are shared at : https://github.com/ravibeta/PHPSamples

Note that both the poller and the SNS notifications are essentially providing data
1) for the meters that have been configured
2) the data can be accumulated for longer than the fourteen days max worth that the APIs provide
3) the datapoints for advanced metrics although come processed with filtering and aggregation at the source can be easily computed using other datapoints from essential metrics

Hence the use of a table helps with the history.
As with all historical tables, we may need to archive and trim it. This process can be done by moving one record at a time from the source table to the archive table The moving operation essentially identifies one of the records to be moved based on an archival policy (say older than a timeframe) and inserts into the archival table if it doesn't have it already, and then deletes from the source table when the insertion is successful.

#codejam
we now discuss another puzzle from the codejam which involves load testing.
The problem is described this way:
Let us say a site can support L people at a time without errors. but the site may have P visitors as the peak number of all visitors that the site may encounter. Since the site cannot support that many visitors, the site has to determine within a factor of C, how many people it can support. This means there is an integer a such that you know the site can support a people, but you know the site can't support a * C people.
Note that this C is the target that we will use. This is given with each input of L and P. The goal is to determine the number of load tests you need to run in the worst case before knowing within a factor of C how many people the site can support.
Let us say this number is x.
The strategy we use is that if we know a then a*C is the maximum we can support, if P is greater than a*C we need to do more load tests for refining what a and a*C can be. Each refinement or iteration increments x by 1.
Logarithmic or exponential scaling may not always work. a 10x strategy might also not be sufficient. Depending on L and P and how large they can get we need to determine a suitable strategy.
Choosing the strategy depends on the range given by L and P. We want to scale L to P in the minimum number of steps. whichever strategy gives us this minimum number of steps we take.
We essentially have two categories of problems which determine whether we need a strategy that takes big steps or small steps.
L and P are small
L and P are large.
We observe that since they are known along with C each load test can pick arbitrary number of machines to bound a and a*C, we can use the worst case always as the one where a*C fails the test.
if a*C passes the test then we can alternate with where a*C fails to find x - the number of iterations. The outcome of a load test is binary.

Now we take some examples to see how this works. We will directly use the examples given:
Input

Output

total number of cases : 4
L         P     C
50    700       2      Case 1
19      57       3      Case 2
1    1000       2      Case 3
24      97       2      Case 4

In case #2, we already know the site can support between 19 and 57 people. Since those are a factor of three apart we don't need to do any testing. Hence x is 0
In case #4, we can test 48 since it is a factor of 2 but in order to test 97 assuming 48 can be tested we will need another test and that will still be less than 97. The worst case is when 49 cannot be supported. 49 then to reliably say 97 can be supported we will need another test.
There is a lower bound and an upper bound of a window where we can say that a certain number (a) can be supported and a certain number (a*C) cannot be supported.

#context switch and revert back to aws billing
The CloudWatch monitoring subscriber can be hosted on an AWS machine because it will automatically provide dog fooding of the subscription. Let us say we host a web server that receives the JSON post callbacks from the subscriber. This subscriber can parse all the incoming messages to fill the metrics table while doing so increases the utilization of the instance which is also being watched, therefore adding validity to the machine. Furthermore, the web server is required to have a Internet addressable endpoint and this is facilitated by AWS instances.

Cluster computing

Saturday, January 2, 2016

No comments:

Post a Comment