Cluster computing

Thursday, April 16, 2015

Today we continue our discussion on streaming proxy design. We were reviewing the assumptions for building an analytical model. The first assumption is that the popularity of media objects follows a zipf distribution. The second assumption was that the request arrival interval process occurs at a known average rate and independent of time. The third assumption was that the clients view the media objects completely. The first assumption gives us a probability set pi. The second assumption gives us a sampling process where the sum of pi equals 1. The third assumption gives us a mean arrival rate as lambda times pi. We also reviewed the added definition of startup length. This is the percentage alpha of the full length of the media object. And Beta is the percentage of the total cache space reserved for all the startup lengths.
This help us determine the startup ratio. To build the model, we consider the ideal case that the cache space is allocated to cache the most popular objects, then the startup lengths of the first t most popular objects can remain in the reserved portion of the cache. This can be extended to imply that the rest of the cache is used for the q most popular objects beyond their startup length. Therefore the delayed startup ratio can be expressed as the sum of the mean arrival rate of the t+1th to the Nth media object over that of all the N media objects.

#codingexercise
GetOddNumberRangeSumCubeRootPowerSixteen) (Double [] A)
{
if (A == null) return 0;
Return A.OddNumberSumCubeRootPowerSixteen();
}

Tuesday, April 14, 2015

Today we continue our discussion on streaming proxy design. We were saying that the reality is that cache size may be bounded. We may not be able to cache the number of segments required for continuous delivery of a media object if there are lots of media objects. One way to overcome this was to determine the priority of media objects. Given a higher priority in reducing proxy jitter, the proxy can choose to evict segments of the object whose cached data length is less than its prefetching length. This will allow the prefetching of the cached segment to be always in time. Even if the segments of popular objects are evicted, the overall proxy jitter reduces at the cost of a little byte hit ratio. Thus we have seen that the byte hit ratio can be traded for less proxy jitter.
This conflict between improving the byte hit ratio and reducing the proxy jitter helped the authors to revise the principle of designing a better proxy based on proxy jitter. They noted that segment based proxy caching strategies always perform well in the byte hit ratio but not so well in the delayed startup ratio. This is further explained when evaluating the adaptive lazy segmentation based scheme.
We also look at the tradeoff between Byte hit ratio and delayed startup ratio. We could see some conflicting interests in this tradeoff from the previous discussion but we build an analytical model now. This analytical model is based on the following assumptions:
1) The popularity of the objects follow a Zipf like distribution
2) The request arrival interval process follows a Poisson distribution with a mean arrival rate lambda.
3) The clients view the requested objects completely. This is to simplify the analysis and does not affect the conclusion.
After we build the model, we will evaluate it with analytical results.
Then we review the author's improved Adaptive-Lazy Segmentation Strategy.
The assumption 1) describes the probability set pi in terms of the fraction of fi to the total fi. fi is the inverse of i raised to the power theta and i can vary from 1 to N the total number of objects. Theta is the skew factor and is positive.
The assumption 2) describes the sampling process as independent and sampled from the aggregate arrival interval process based on probability set pi where the sum of pi equals 1.
The assumption 3) lets us calculate the mean arrival rate as lambda times pi.
In addition to these assumptions, the following definitions make it easier to calculate the delayed startup ratio.
The startup length is the length of the beginning part of an object. If this part is cached, then there is no startup delay. Let alpha be this percentage of the full object length. Let Beta be the percentage of the total cache space reserved for caching the startup lengths of objects.

Monday, April 13, 2015

today we continue our discussion on the design of streaming proxy. we were calculating the number of segments for continuous delivery. We were considering both th uniformly segmented media and the exponentially segmented media. We now review the tradeoff between the low proxy jitter and the high byte ratio and then the byte hit ratio versus the delayed startup ratio.In theory, we know the minimum number of segments that must always be cached in the proxy to guarantee a proxy jitter free delivery. In practice, this is difficult for each and every media object because cache size is limited. One way to overcome this is to determine if a media object is popular. Popular objects are always cached to reduce network traffic and server load. If an object is popular enough all its segments can be cached in the proxy even larger than the prefetching length. On the other hand, if the objects are not popular enough, some segments may get evicted and only a few of it's segments cached. This can contribute to proxy jitter. Given a higher priority in reducing the proxy jitter, the proxy can choose to evict segments of the object whose cached data length is larger than its prefetching length so that the prefetching of its uncached segments can always be in time. If the popular objects get evicted by any chance, this would contribute to the byte hit ratio. Thus we can simulate the availability of all segments for media objects that are popular.

Sunday, April 12, 2015

We continue the coverage of some JavaScript libraries. We resume with the next library in the list mentioned. This is groupie. groupie provides semantics of a group where all functions are executed at once and a chain where they are executed in the declared order. The function registrations for group or chain is similar to the previous ones we have seen from the list.
The next library in the list is continuables which exposes the semantics that a unit continuables can be fulfilled. Consequently, many of the can be grouped or chained. What sets this library apart is the ease of use for node.js developers. For example:
Var async_fn = function(Val){
Var continuable = continuables.create();
Process.nextTick(function(){
Continuable.fulfill(Val);
});
Return continuable;
}
Now the continuables can be chained . If the chain ends with error it will be thrown. To prevent the continuable must return something. Therefore error and success cases can be differentiated based on the presence of return values and separate callbacks for these two states can be invoked via continuables.either
Slide exposes semantics similar to Async library and functions registered should not be throwing an error and instead pass it to the callback. The node.js has similar constructs in its low level but this library is puportedly easier. The convention introduces two kinds of functions. actors that take action, callbacks get results.callbacks handle all errors and is therefore the first argument. Callbacks can trap/call other callbacks. Actors pass a callback as the last argument. Actors must not throw and the return value is ignored. The library has a construct called asyncMap which is similar to the group functionality mentioned earlier. Essentially it waits for all the registered actors and callback to be complete. It also has a chain construct that enables one by one continuation.
Step is another library that enable parallel execution in addition and similar error handling.

Saturday, April 11, 2015

There are quite a few patterns in the use of Javascript. Here we cover a few.

1) Async This pattern is able to flatten the nested callbacks usually seen with making one call after the other.

a. For example if we have

i. getMoney(err, success){

if (err || !success)

throw Error(‘Oops!’);

callSweetheart(err, success{

b. Then this can be serialized with

Async.chain( function getMoney(callback) {

earn();

callback(null);

},

function callSweetheart(callback) {

dial();

callback(null);

},

Function (err){ // callback

If(err){

Console.log(err);

}

});

c. Chain also does more than serialization and consolidation of callback. It passes the result of one function as a parameter into the other. The parameters are dependent entirely on the previous function except for the last one which must be a callback.

d. Async.series is also available. This takes multiple functions in series.Each task takes the callback as a parameter. When all the functions are run or if there is an error, then the function is called with the combined results of all tasks in the order they were run.

var counter = 0;

Async.series([

Function(done) {

Console.log(counter++); // == 0

Done(null, 1);

}),

Function(done){

Console.log(counter++); // == 1

Done(null, 2,3);

}],

Function (err, one, two) {

Console.log(err); // = null;

Console.log(one); // = 1

2 / 2

Console.log(two); // == [1,2]

}

);

e. Async.parallel is also available. The tasks may not run in the same order as they appear in the array.

2) Flow.js function defines capabilities very similar to the above. The flow.exec is a convenience function that defines a flow and executes it immediately, passing no arguments to the firstfunction.

a. Flow.exec(function(){

Dosomething(this);

}, function(err){

If (err) throw err;

DoSomethingElse(this);

}, function (err, result){

If(err)throw err;

Console.log(result);

}

);

b. Sometimes a step in a flow may need to initiate several asynchronous tasks and wait onall of them before proceeding to the next step. This is called multiplexing and is achieved by passing this.MULTI() instead of this as the callback parameter.

Flow.exec(function(){

doSomething(this);

},function(param){

doSomethingDifferent(param1, this.MULTI());

doSomethingDifferent(param2, this.MULTI());

}, function(){

Okwearedone();

}

});

c. There is another convenience function called serialForEach which can be used to apply an asynchronous function to each element in an array of values serially.

3) next we discuss the following libraries:
funk,
futures
groupie
node-continuables
SlideStep
node-inflow
Node-inflow
,
Funk is a software module that provides the following syntax to serialize and parallelize callbacks.
Var funk = require('funk')('serial'); // ctor
Funk.set ('foo', 'bar'); // save results to be called in run ()
//add callback to be executed either in series or parallel:
SetTimeout(funk.add (err, success){
}, 200);
SetTimeout (funk.add (err, success){
},100);

SetTimeout (funk.nothing (), 200);
SetTimeout (funk.nothing (), 100);

Funk.run (); // both timeout will be called

Future or FuturesJS is another asynchronous toolkit
It provides constructs like
Join
ForEachAsync
ArrayAsync,
- someAsync
- FilterAsync,
- everyAsync,
- mapAsync,
- reduceAsync,

Join calls any number of asynchronous calls together similar to how pthread_join works or the then() promise works.

Var join = window.create ();
SetTimeout(join.add (), 200);
SetTimeout(join.add (), 100);

Join.notify (function (index, args){
Console.log (" callback # " + index + " " + args);
});

ArrayAsync provides asynchronous counterpart for each of the Array iterate methods.
FilterAsync(['dogs', 'cats', 'octocats'], function ( next, element){
Do (element) function (likesIt){
next(likesIt);
})
}).then (function (newArr){
DisplayLikes(newArr);
})
})();

Friday, April 10, 2015

Today we continue reading the paper on the design of streaming proxy systems.
We discussed the uniform and exponentially segmented media objects.
We talked about prefetching and minimum buffer size for such media. The minimum buffer size ensures low resource usage. The pre-fetching gives the scheduling point. It doesn't mean that the jitter can be avoided in all cases. The uniformly segmented media object has an advantage over the exponentially segmented object. It enables in-time prefetching which can begin at a later stage. Even so, continuous media streaming has not been guaranteed. One suggestion might be to keep enough segments cached. this leads us to define a prefetching length as the minimum length of the data that must be cached in the proxy in order to guarantee the continuous delivery when Bs > Bt. Prefetching is not necessary when Bs < Bt. Bs is the encoding rate and Bt is the network bandwidth averages respectively. Prefetching length aggregates cached segment length without breaks. Therefore we calculate the number of segments m for continuous delivery. In the case of uniform segmented media objects each segment length is the same. In the case of the exponentially segmented media objects, each cached segment length is twice that of the previous. We review the tradeoff between low proxy jitter and high byte ration and then they byte-hit ratio versus the delayed startup ratio.
#codingexercise

GetEvenNumberRangeSumCubeRootPowerFourteen) (Double [] A)

{

if (A == null) return 0;

Return A.EvenNumberSumCubeRootPowerFourteen();

}
We will take a short break now.

Thursday, April 9, 2015

Today we will continue our discussion on the design of streaming proxy systems. We were discussing active prefetching. Prefetching schemes can reduce proxy jitter by fetching uncached segments before they are accessed. We discussed the cases of both uniformly segmented as well as exponentially segmented media object For the uniformly segmented scheme, the segments take equal amount of time. Consequently, the segments upto the ratio of Bs/Bt cause proxy jitter. This threshold is determined based on the latest that a segment needs to be fetched. Recall that this position is determined such that the time it takes to prefetch this segment should not exceed the time that it takes to deliver the rest of the cached data and the fetched data. The minimum buffer size is calculated accordingly as (1- Bt/Bs) L1. This is true for all ranges namely the first cached segment, the cached segments upto the threshold and the cached segments beyond the threshold and after.
In the case of the exponentially segmented object, a similar analysis can be done. Here, we assume Bs <= 2 Bt. When it is not so, no prefetching of uncached segments can be in time for the exponentially segmented objects. If n is the number of cached segments, then for n = 0, we have to prefetch upto (1 + log-base-2(1/(2-Bs/Bt)) segment is necessary to avoid proxy jitter in the cases thereafter. The minimum buffer size is calculated by using this threshold in the same kind of calculation as above. For n > 0 and less than the threshold, the proxy starts to prefetch the threshold segment once the client starts to access this object. The jitter is unavoidable between the n+1 th segment to threshold segment and the minimum buffer size is Li times Bt/Bs where Li is that of the threshold. For segments that are larger than the threshold, the prefetching of the n+1th segment starts when the client accesses the first 1 - 2^n / (2 ^n - 1) ( Bt / Bs - 1) portion of the first n cached segment. The minimum buffer size is Ln+1 * Bt / Bs and increases exponentially for later segments.