Cluster computing

Monday, December 14, 2015

AWS Metrics
AWS metrics relevant to compute are provided by
1. Amazon ECS Metrics for clusters which monitor CPU and memory utilization across the cluster as a whole, and across the services in the clusters.
2. Amazon EC2 Metrics for compute instances which monitor CPU utilization, disk read/writes, network traffic and status checked
Amazon ECS Metrics:
CPUUtilization - The percentage of CPU units that are used in the cluster or service.
Cluster CPU utilization (metrics that are filtered by ClusterName without ServiceName) is measured as the total CPU units in use by Amazon ECS tasks on the cluster, divided by the total CPU units that were registered for all of the container instances in the cluster.
Service CPU utilization (metrics that are filtered by ClusterName and ServiceName) is measured as the total CPU units in use by the tasks that belong to the service, divided by the total number of CPU units that are reserved for the tasks that belong to the service.
Units: Percent
MemoryUtilization - The percentage of memory that is used in the cluster or service.
Cluster memory utilization (metrics that are filtered by ClusterName without ServiceName) is measured as the total memory in use by Amazon ECS tasks on the cluster, divided by the total amount of memory that was registered for all of the container instances in the cluster.
Service memory utilization (metrics that are filtered by ClusterName and ServiceName) is measured as the total memory in use by the tasks that belong to the service, divided by the total memory that is reserved for the tasks that belong to the service.
Units: Percent
Dimensions for Amazon ECS Metrics are also provided to refine the metrics returned for the Amazon ECS resources. These include:
ClusterName
This dimension filters the data requested for all resources in a specified cluster. All Amazon ECS metrics are filtered by ClusterName.
ServiceName
This dimension filters the data requested for all resources in a specified service within a specified cluster.
Amazon EC2 Metrics:
The following metrics are available from each EC2 instance.
CPUCreditUsage
(Only valid for T2 instances) The number of CPU credits consumed during the specified period.
CPUCreditBalance
(Only valid for T2 instances) The number of CPU credits that an instance has accumulated.
CPUUtilization
The percentage of allocated EC2 compute units that are currently in use on the instance. This metric identifies the processing power required to run an application upon a selected instance.
DiskReadOps
Completed read operations from all ephemeral disks available to the instance in a specified period of time.
DiskWriteOps
Completed write operations to all ephemeral disks available to the instance in a specified period of time.
DiskReadBytes
Bytes read from all ephemeral disks available to the instance
DiskWriteBytes
Bytes written to all ephemeral disks available to the instance
NetworkIn
The number of bytes received on all network interfaces by the instance.
NetworkOut
The number of bytes sent out on all network interfaces by the instance.
Dimensions for Amazon EC2 Metrics allow you to filter the EC2 instance data
These include:
ImageId : This dimension filters the data requested for all instances running this EC2 Amazon Machine Image (AMI).
InstanceId: This dimension filters the data you request for the identified instance only.
InstanceType: This dimension filters the data you request for all instances running with this specified instance type.
AWS metrics relevant to storage are provided by Amazon Simple Storage Service Dimensions:
BucketSizeBytes - The amount of data in bytes stored in a bucket in the Standard storage class or in the Reduced Redundancy Storage (RRS) class.
NumberOfObjects - The total number of objects stored in a bucket.
And the corresponding dimensions used to filter the data include:
BucketName - This dimension filters the data requested for the identified bucket only.
StorageType - This dimension filters the data you have stored in a bucket by the type of storage. The types are StandardStorage for the Standard storage class, ReducedRedundancyStorage for the Reduced Redundancy Storage (RRS) class, and AllStorageTypes.
Sample API call: http://monitoring.amazonaws.com/?SignatureVersion=2
&Action=ListMetrics
&Version=2010-08-01
&AWSAccessKeyId=< AWS Access Key Id>
&SignatureVersion=2
&SignatureMethod=HmacSHA256
&Timestamp=2010-11-17T05%3A13%3A00.000Z

Sample error response
This is specified as:
<ErrorResponse>
<Error>
<Type>Sender</Type>
<Code>InvalidParameterValue</Code>
<Message>Invalid nextToken</Message>
</Error>
<RequestId>cb1c4191-0e5d-11e2-9c15-6f306828daaa<RequestId>
</ErrorResponse>

SDK Libraries are available in Java, PHP, Python, Ruby and .NET.

Sample to get metrics
To get statistics we use the GetMetricStatistics API to get a variety of statistics though these are only available for 14 days.
Example: Call GetMetricStatistics with the following parameters to get the CPU utilization per hour for an EC2 instance for a 3-day range

MetricName = CPUUtilization
Period = 3600
Statistics list includes Maximum
Dimensions (Name=InstanceId, Value="<your-instance-id>")
Namespace = AWS/EC2
StartTime = 2011-01-09T23:18:00
EndTime = 2011-01-12T23:18:00

Sample to create alarms
To create or update an alarm, you use the PutMetricAlarm API function
To list any or all of the currently configured alarms, and list any alarms in a particular state use the DescribeAlarms API
To disable and enable alarms use the DisableAlarmActions and EnableAlarmActions APIs (mon-disable-alarm-actions and mon-enable-alarm-actions commands).
To test an alarm by setting it to any state use the SetAlarmState API (mon-set-alarm-state command)
Finally, to view an alarm's history use the DescribeAlarmHistory API (mon-describe-alarm-history command).

#codingexercise
String ReverseWordsInASentence(String sentence)
{
var words = sentence.split(' ');

var reversedSentence = words.Aggregate((sent, next) => next + " " + sent);

return reversedSentence;
}

Sunday, December 13, 2015

Billing System for IT Cloud Provider

Introduction

Billing Management System

Providers

Architecture

Provider Data

Openstack Billing provider:

Amazon Billing Provider

Implementation

Conclusion

Introduction:

Billing is useful to the consumers to know the usages that they make of the system resources. Although a private cloud provider may not require payments from the consumer, a report of various system resource usages together with some analysis charts and graphs make for a very presentable and actionable memo. The consumer can then directly participate in capacity planning and conservation.

Billing Management System:

This is a where a software billing management system comes useful. The billing system for a cloud provider is dedicated to the collection and processing of the usage data as it pertains to the end consumer billing. Detailed metrics for system resources are drawn from pay per use monitors that are either available internally or via external providers. By external provider we mean cloud providing vendor software that may already have its own telemetry. These gather runtime usage data that is stored in a repository. When the usage data is coupled with different pricing policies, we can get different bills generated. Pricing policies can include different schemes such as resource based or consumer based and models can vary from traditional pay-per use models to flat rate or pay-per-allocation models or their combinations.

Usage may also be combined with quotas. When quotas are exceeded, they can appear on the bills with notations for blocking further usage requests by these cloud consumers.

Providers:

Different providers provide their own mechanisms for usage monitoring and billing. Two examples cited here for providers are Openstack and AWS.

In the case of Openstack, we have Ceilometer. Ceilometer is OpenStack’s telemetry. It’s source code is available at https://github.com/openstack/ceilometer. It collects all kinds of monitoring and metering information such that no two agents need to be written to collect the same data. While monitoring and metering are different, Ceilometer’s metering consumer is say the billing system for which it acts as a unique and central point of contact to acquire all of the information collected across all OpenStack core components. The project strived to be efficient in terms of CPU and network costs. It supports both the push notifications from the existing services and the pull model by polling the infrastructure. Deployers can configure the type of data collected. Publishers are available as a Python implementation. A REST API is also available to expose the data collected by the metering system.

Bills even if written off can be delivered by mail. Their layout and crispness speak to the metering and usages giving a good indication to the customer of what is costing how much.

With regard to AWS Billing, we have a console that is built into the AWS Billing and Cost Management that already has a well known and documented cost analysis. Bills can be generated and exported as CSVs in S3 containers that can feed into different applications.

As a consolidator over cloud providers, we can maintain a consistent common user interface that accesses the same information from the underlying providers when available or adding new.

The design is one of a pull model where each bill is generated by a pull of the metrics, usage and rates from the underlying providers and our own database.

Architecture:

Central to the discussion on billing is the notion of a database for a customer. This is proposed in spite of the direct API access available to different cloud providers because a database promotes an ecosystem that leads to different workflows. The APIs are still used to populate the data except that they are now organized in a time-series manner and analyzed per customer or resource

UI / Console Provider 1

pull API/Database push

Bills (emails) Provider 2

Provider Data:

Openstack Billing provider:

Ceilometer measures the usage of resources in terms of disk usage, network usage and CPU usage. For each instance the usage details for following meters will be shown:

1. Disk Read Bytes

2. Disk Write Bytes

3. Disk Write Requests

4. Disk Read Requests

5. Network outgoing bytes

6. Network incoming bytes

7. Network outgoing packets

8. Network incoming packets

9. CPU

Ceilometer arranges all the data in terms of resources and their meters.

We can get a list of all the meters using

GET /v2/resources/ and

GET /v2/resources/(resource_id)

Where resource contains links such as a url to identify itself and a url for its meters such as http://localhost:8777/v2/meters/volume?q.field=resource_id&q.value=bd9431c1-8d69-4ad3-803a-8d4a6b89fd36

GET /v2/meters/ that returns all know meters and

GET /v2/meters/(meter_name) which even come with statistics

GET /v2/meters/(meter_name)/statistics

Samples and statistics can also be drawn from

GET /v2/samples

And GET /v2/ samples/(sample_id)

Tables can be created in the following manner:

1) RatePlan

a. Id

b. Name

c. CPU cost per second

d. Disk cost per MB

e. Network cost per MB

f. Cost for flavors

2) Table for instance-bill will have the following attributes

a. Resource_id

b. Rateplan_id

c. Date and time at which the bill was calculated

d. Resource_bill

Amazon Billing Provider

AWS Cloud operates on a pay as you go model so the bill reflects the actual usage. This also means that we need to check the account activity every so often or get alerts through alarms setup through Amazon CloudWatch metrics and alarms. The Amazon CloudWatch provides the monitoring of the estimated charges using billing alerts. Furthermore it delivers via the Amazon SNS – a well known publisher subscriber service for Mobile and Enterprise Messaging. The billing alerts sent by Amazon CloudWatch include estimates stored as CloudWatch metrics and this metric data is stored for 14 days. The data includes variations of the following:

Estimated charges : Total

Estimated charges : by service

Estimated charges : by Linked Account

Estimated charges : by Linked Account and Service

Note that these are estimates and not predictions. They approximate the cost and not take the trends or potential changes in the AWS usage pattern into account

The use of these alerts is unique for the consolidated billing for the private cloud provider. Essentially the alerts populate a dynamic view (in-memory) or table (stored) for each customer, account and service.

The table can have the following attributes

- Current period cost by service

- Cumulative cost by service

With this table, we can generate reports with customer account and service granularity. If we keep time series collection of these costs, we can pull reports as well. One thing that should stand out is that CloudWatch already manages most of the cost estimation and can provide different kinds of analysis builtin into the alerts. Therefore the table we keep for the data that we need to pull for our customers is merely a staging store. The bulk of the calculation and processing is already done for us.

The relevant APIs to populate the staging store in this case would be as follows:

Services/cloudwatch.class.php - __construct ( $options ) :

Constructs a new instance of AmazonCloudWatch:

put_metric_alarm ( $alarm_name, $metric_name, $namespace, $statistic, $period, $evaluation_periods, $threshold, $comparison_operator, $opt )

Creates or updates an alarm and associates it with the specified Amazon CloudWatch metric. Optionally, this operation can associate one or more Amazon Simple Notification Service resources with the alarm.

delete_alarms ( $alarm_names, $opt )

Deletes all specified alarms

Retrieves history for the specified alarm. Filter alarms by date range or item type. If an alarm name is not specified, Amazon CloudWatch returns histories for all of the owner’s alarms.

enable_alarm_actions ( $alarm_names, $opt )

Enables actions for the specified alarms.

disable_alarm_actions ( $alarm_names, $opt )

Disables actions for the specified alarms. When an alarm’s actions are disabled the alarm’s state may change, but none of the alarm’s actions will execute.

list_metrics ( $opt )

Returns a list of valid metrics stored for the AWS account owner. Returned metrics can be used with GetMetricStatistics to obtain statistical data for a given metric.

There is no need to merge the bills from different providers since we strive to be more granular in itemizing the costs. They can appear as different sections one from each provider. This saves the trouble of artificially merging into a common set of metrics when it may not be available as such from each provider. And even if they are available, their syntax, semantics and nuances may vary. Our customer recognize services and instances of compute and storage. As long as we maintain those as granular as possible in the bills, the customer has the luxury of aggregating these himself or seeing the summary wherever possible in one of the analysis charts.

Note that the providers may also maintain their own billing database that provides a rich query API. This might even argue for removing our database and performing the query on demand when the bills are being requested by the user in an interactive mode or during batch processing. However such databases may rarely store data more than a few weeks worth leaving our system to poll and store every so often to build up a time series data for longer analysis. In addition, the user account or plan may change frequently and may even to get notifications of when she is exceeding the free tier of usages. Additionally, one provider may promote a notification model preferred over a polling model while another may offer the other model to pull the data. A staging store for the billing data as relevant to the consumer bills enables to talk to different providers. We can keep the staging store as minimal as necessary, relying more on getting the information by APIs but it doesn’t seem to be avoidable.

Implementation:

Most of the cloud providers mentioned as examples are already providing rich set of APIs for their usage as shown. We just have to focus on the service that extracts, transforms and loads the database for use by the administrators and the consumers. Reports and publishing as emails can be done by pulling the data from this database.

Conclusion:

This software is easy to build with the existing frameworks that we already have both in terms of the data providers as well as the infrastructure for hosting it.

Friday, December 11, 2015

Today we continue to review the paper by Dekel, Gilad-BachRach, Shamir and Xiao which described optimal distributed online prediction using mini-batches. In this section we review serial online prediction using mini-batches.
We saw with the analysis of the parallel DMB algorithm in the earlier post that the expected regret was based on the variance of the stochastic gradients. The dependency on variance implies that we can average the stochastic gradients over mini-batches to reduce the variance.
In the serial mini-batch algorithm we do so. Let us take a look.
The update rule is applied after each input is received in the previous section. However, in the serial mini-batch algorithm, this is done only periodically hence the notion of mini-batches where the batch-size is defined by the user. Here b is a user-defined batch size and every b consecutive inputs is considered as a batch.
The algorithm now becomes:
for j = 1,2, ... do
initialize avg-gradient-j = 0
for s = 1 to b do
define i = (j - 1)b + s;
predict wj;
receive input zi sampled i.i.d from unknown distribution
suffer loss f(wj, zi);
gradient-i = delta-w f(wj, zi);
avg-gradient-j = avg-gradient-j + (1/b) gi;
end
set (wj+1, aj+1) = phi (auxiliary state vector aj, gradient gj, and step-size)

The prediction remains constant for the duration of each batch and is updated only when a batch ends. While processing the b inputs in batch j, the algorithm calculates and accumulates gradients and defines the average gradient. Each such mini batch computes one average gradient. The
appeal of the serial mini-batch setting is that the update rule is used less frequently, which may have computational benefits.

#codingexercise
int max(List<int> numbers)
{
max = INT_MIN;
for (int i = 0; i < numbers.Count; i++)
if (numbers[i] > max)
max = numbers[i];
return max;

}

Thursday, December 10, 2015

Today we continue to review the paper by Dekel, Gilad-BachRach, Shamir and Xiao which described optimal distributed online prediction using mini-batches. In this section we discuss their analysis of the No-Communication Parallel solution. Using the abstract notation Psi as a function of variance and number of iterations m for the expected regret bound they proceed with the naiive no communication solution. Here each of the k nodes in the parallel system applies the same serial update rule to its own sub stream of the high rate inputs and no communications take place between them. If the total number of examples processed by the k nodes is m then then it is equally distributed as upper bound m / k inputs to each of the nodes. As we recall from the initial introduction each such input is independent and identically distributed (i.i.d) from the original distribution, with the same variance bound for the stochastic gradients. Therefore each node suffers an expected regret of at most Psi (variance, upper-bound(m/k)) on its portion of the input stream, and the total regret bound is obtained by simply summing over the k nodes. Note that this summation form is enabled because we have the inputs as i.i.d. The expected regret for the entire distribution is therefore k time Psi as found above. Now the authors have mentioned two theorems :
First is that if the loss function is L-smooth and convex in predictions for each data in Z and the stochastic gradient has the bounded variance as above and let D= square root of the max prediction error /2. Then using an iteration dependent parameter as L plus standard deviation divided by D times square root j the iteration number, we can bound the expected regret as error in regret plus D-squared times L plus 2 times D times standard deviation times square root (m). This is true for both the examples cited earlier by the authors - the projected gradient descent and the dual averaging method. Using this form of the bound in the no communication parallel system analysis, we apply it as expected regret bounded by 2 times k times D squared L plus 2 times D times standard deviation times k times square root(m divided by k). If we compare this bound to the one for the ideal serial solution k = 1, we see that it is approximately square root(k) times worse in the leading term. This is the price of no communication in this parallel system. However, the authors improve it by avoiding the square root k cost in their distributed mini-batch algorithm.
Note that as with most mini-batch algorithm, we tradeoff between the full batch and the stochastic method because we can reset after m/k batches. While we use the summation form after every k-batches, we don't go the step of extrapolating the results of the previous mini-batch to the current mini-batch.
#codingexercise
int min(List<int> numbers)
{
min = INT_MAX;
for (int i = 0; i < numbers.Count; i++)
if (numbers[i] < min)
min = numbers[i];
return min;
}
void BillingProvider(Customer c, Dictionary<Metrics,KeyValuePair<quantity,rate>> usage)
{
Double total = 0;
foreach (kvp in usage)
{
Double sum = kvp.value.key*kvp.value.value
Console.WriteLine("{0} costs {1}", kvp.key,sum);
total += sum;
}
Console.WriteLine("Total={0}", total);
}

Tuesday, December 8, 2015

Today we continue discussing the paper by Dekel, Gilad-BachRach, Shamir and Xiao which described optimal distributed online prediction using mini-batches.Their method can use any gradient-based update rule by treating the serial online prediction as a black box and convert it into a parallel or distributed online prediction algorithm. Their regret bounds for smooth loss functions are variance based and state refined.
They provide two examples of the update rule that fits the above template.
The first is the projected gradient descent update rule where the next prediction is found by applying the Euclidean projection onto the set W to the current prediction without the scaled gradient. This scaling factor is also called the decaying learning rate and is the negative inverse of auxiliary state vector alpha-j that summarizes all of the necessary information of the past. Alpha-j is typically set to be having a complexity of square root(j) where j is the current iteration. In this example of the update rule, we can simply set alpha-j to be current prediction wj and use it as is to get the next prediction as the update rule.
The other example given is the dual averaging method. It's update rule takes the form such that the next prediction is based on the vector inner product of the sum of gradients with the prediction taken together with a monotonically increasing sequence of positive numbers times the strongly convex auxiliary function
#codingexercise

Given two arrays, first one containing a number sequence and the second one containing number s that can be found in the sequence, find the smallest window in the number sequence that has all the elements from the second array:
#codingexercise
int FindSmallestWindow(List<int> haystack, List<int> needle)
{
// parameter validation and assume no duplicates in haystack or needle.
var indexes= new SortedList();
for (int I =0; I <haystack.Length; I++)
indexes.Add( haystack[i], i);
Array.sort(needle);
// indexes and needle both sorted
now we can use the sorted sequence and index arithmetic linearly
int min = haystack.Length - 1;
int max = 0;
int range = 0;
for (int i = 0; i < needle.length; i++)
{
int index = indexes[needle[i]];
if (index < min) min = index;
if (index > max) max = index;
}
if (min > max) {int temp = min; min = max; max = temp;}
int range = max - min + 1;

return range;
}
int GetWindow(out start, out length, int[] haystack, int[] needle)
{
int minrange = haystack.Length; // default value
for (int start = 0; start < haystack.Length; start++)
{
var found = new Dictionary<int,int>();
for (int i = start; i < haystack.Length; i++)
{
if (needle.contains(haystack[i]))
{
if (found.length == needle.length)
{
int min = found.values.min();
int max = found.values.max();
if (max-min+1 < minrange)
minrange = max-min+1;
break;
}
else
{
found[haystack[i]] = i;
}
}
}
}
return minrange;

}

Monday, December 7, 2015

#codingexercise
Given a binary tree, find the largest path size from one leaf to another:
int getLargestPathSize(node* root)
{
var leafdepths = new List<int>();
setdepth(root, 0, ref leafdepths);
leafdepths.sort();
leafdepths.reverse();
if (leafdepths.count >= 2)
return leafdepths[0] + leafdepths[1];
return 0;
}

void setdepth(node *root, int level, ref List<int> leafdepths) {
if (!root) return;
int curLevel = level + 1;
setdepth(root->left, curLevel, ref leafdepths);
setdepth(root->right, curLevel, ref leafdepths);
if (root->left == null && root->right == null)
leafdepths.Add(curLevel);
}

Dekel, Gilad-BachRach, Shamir and Xiao described optimal distributed online prediction using mini-batches. They focus on large-scale and high rate prediction problems. The stochastic online prediction problem is one where a stream of inputs z1, z2, … zm is sampled independently from a fixed unknown distribution over a sample space Z. Before observing each sample space zi, a point wi is predicted from the set W. After making the prediction wi, we observe zi and suffer the loss f(wi, zi) where f is a predefined loss function. Then zi is used to improve the prediction mechanism for the future. The quality of predictions is measured using the notion of regret which is the difference between the cumulative loss of our predictions and the cumulative loss of the fixed predictor which is optimal with respect to the underlying distribution. Since regret relies on stochastic variables, it is a random variable.

Furthermore, they bound this regret with O(sq.rt(m)) which is the same as in the optimal for a serial gradient based algorithm. And their method can use any gradient-based update rule by treating the serial online prediction as a black box and convert it into a parallel or distributed online prediction algorithm.

Their method is therefore succinctly put as

For j = 1, 2, … do

predict wj;

receive input zj sampled i.i.d from unknown distribution

suffer loss f(wj, zj)

define gradient gj = delta f(wj,zj)

compute (wj+1, aj+1) = phi (aj, gj, alpha-j);

end

Notice the last step is an unspecified update rule where we take the following three arguments:

- an auxiliary state vector aj that summarizes all of the necessary information about the past

- a gradient gj of the loss function evaluated at wj

- and an iteration dependent parameter alpha-j as stepsize

to compute the next predictor and the new auxiliary state predictor.

This is very similar in nature to what I tried to by evaluating the data once as it arrives using stochastic gradient based summary but by initializing the current residual with similar update to treat the new data while reducing it completely for the previous iteration data. For example we can initialize the residual for the next iteration with the weighted average of the conventional initial residual and the previous initial residual. And we can measure the loss with the sum of squares.

Friday, December 4, 2015

The authors of the paper on Optimal Distributed Online Prediction Using Mini-Batches provide a method that can be called a distributed mini-batch algorithm, a method of converting any serial gradient-based online prediction algorithm into a parallel or distributed algorithm. This method has two important properties:
It can use any gradient based update rule for serial online prediction as a black box and convert it into a parallel or distributed online prediction algorithm.
If the loss function f(w,z) is smooth in w, the prediction, then this method attains an asymptotically optimal regret bound of O(sq.rt(m)). Moreover, the co-efficient of the dominant term sq.rt(m) is the same as in the serial bound, and independent of k and of the network topology.
The notion of mini-batches is that we don't process over the entire data set but in iterations. Meanwhile the gradient is not updated after the entire dataset but after a certain number of iterations. It is not the same as stochastic method which can update the gradient after every iteration. Hence is it is a cross between batch and stochastic gradient method.
They average a mini batch of stochastic gradients computed at each predictor. They propose that the standard deviation of those stochastic gradients reaches the coefficient in the regret bound.
#codingexercise
Given two single digit integer arrays where the elements taken together form a number, maximize the number by replacing it with elements of the second array.
void MaximizeWithReplace(List<int> input, List<int> replace )
{
replace.Sort();
replace.Reverse();
for (int i = 0; i < replace.Length; i++){
bool replaced = false;
for (int j = 0; j < input.Length; i++){
if input[j] < replace [i]{
input[j] = replace [i];
replaced = true;
break;
}
}
if (!replaced) break; // done
}
}