Cluster computing

Monday, December 15, 2014

Decimal GetEvenNumberRangeMode(Decimal [] A)
{
if (A == null) return 0;
Return A.EvenNumberRangeMode ();
}

Today we will continue to discuss the WRL system for trace generation. We discussed the effects of direct mapped and associative second level cache. We now review the line size in first and second level cache. When the line size was doubled or quadrupled with a 512 K second level cache, there were reductions in the Cumulative Total CPI and this was independent of the second level size. The improvement was due to decreased contribution of the instruction cache. The data cache remained constant. The improvement in doubling the line size was the same as the improvement in increasing the second level cache size eight times. The effects of doubling the length of the lines in the second level cache from 128 bytes to 256 bytes made no difference. This may be due to too much conflict for long line sizes to be beneficial.
We next review write buffer contributions. The performance of the write buffer varied from program to program. This is mainly due to the sensitivity of the write buffer to both the proportion and distribution of writes in the instructions executed. A write buffer entry is retired every six cycles. If the writes were any more frequent or bunched, it would degrade the performance. The relationship between the proportion of writes in a program and the write buffer performance is clear. The CPI contributed by the write buffer shows a corresponding jump.
There was a case where the percent of writes was frequently in the 15-20% range but the CPI write buffer is usually zero. If the writes were uniformly distributed below a threshold, the write buffer will never fill and a subsequent write will never be delayed. Above the threshold, there may be some delays. Since the second level cache is pipelined into three stages, the processing of a single miss gives the write buffer a chance to retire two entries. If enough misses are interspersed between the two writes, the write buffer may work smoothly Thus the first level data cache miss ratio is the third factor that comes into play.
Although this report doesn't study it, but the effects of entropy on the cache distribution hierarchy could be relevant. The Shannon entropy is defined as negative sum of P(x) log base 2 P(x)

Sunday, December 14, 2014

Data forwarders rely on one of the following means :
file monitoring
pipe monitoring
network port monitoring
settings/registry monitoring
etc.
However there are some protocols that also exchange data, ftp, ssh, http and although they are over network ports, they have to be filtered.
Instead take for example ssh, if there were a daemon that could forward relevant information on a remote computer via ssh, then we don't need a data forwarder on the remote machine and the footprint of the forwarder will be even smaller. A forwarder's reachability of data can be considered to be expanded by ssh data forwarders.
Now let us consider ssh forwarder code. This may be set up as a listener script so that on all incoming connections using a particular login, we echo the relevant data back. In this case, the script is as small as a bash script that can read up and spew out the relevant data as a response to the incoming connection and close the connection.
Since the data is forwarded only on ssh connection, this model works well with monitoring mechanisms that can periodically poll. Most monitoring mechanisms are setup as publisher subscriber model or polling model.
The SSH connections have the additional benefit that they are generally considered more secure than the the TCP connections because the data is not in the clear. Furthermore, the script deployed to the remote computer follows the usual mechanisms for securing this access. Hence it is likely to meet up with less resistance in deployment. On the other hand, consider the installation of a binary as part of product package on the remote machine. That may take a lot of questions in some cases such as licensing, fees etc.
Lastly, the commands to determine the data to be forwarded and sending it are typical shell commands and there is a more transparency in what script is deployed.
#codingexercise
Decimal GetEvenNumberRangeMax(Decimal [] A)
{
if (A == null) return 0;
Return A.EvenNumberRangeMax ();
}
#codingexercise
Decimal GetEvenNumberRangeMean(Decimal [] A)
{
if (A == null) return 0;
Return A.EvenNumberRangeMean ();
}

Saturday, December 13, 2014

Today we continue to review the WRL system for trace generation. We read up on the effect of direct-mapped versus associative second level cache. When associativity is increased, miss ratio can decrease but the overall performance may not improve or even degrade. If the cache is very large, it is likely to have a low missed ratio already. Increasing the associativity might provide little or no benefits. Associativity helps with the miss ratio but it adds up to the cost of every reference to a cache.
If we ignore this cost, we can bound the benefits of associativity with a comparison between the direct mapped and the fully associative caches. The results showed that the cumulative CPI for a fully associative second level cache with four different sizes were minimally effective for large caches but more effective for small caches.
A more useful measure for the benefits of associativity may be to see the effect on the total CPI as a function of the cost per reference of the associativity.
We know that the CPI for the associative case will be a delta more than the CPI for the direct mapped case. This delta is the combination of the first level miss ratio and the difference in the average cost per reference to the second level cache. This difference can be elaborated in terms of the cost of a hit in the associative cache (ha), the cost of a hit in a direct mapped cache (hd), the cost of a second level cache miss (same direct-mapped and associative) (m), the miss ration for a second level associative cache and the miss ratio for a second level direct-mapped.
In this case, the delta in the average cost per reference can be worked out as the difference in the average cost per reference in the associative case and in the direct mapped case. The first term is sum of the costs of access to the associative second level and the miss from this level. The associative second level cost of access is a combination of the cost of a hit and the miss ratio for that cache. One minus this miss ratio times the cost of a second level cache miss which is the same for a direct mapped and the associative case is the cost of a miss from this level. This kind of elaboration is true for the second term as well in the case of the direct mapped second-level cache. Writing the costs of a hit in terms of a ratio, we get the delta in the average cost per reference to be weighted based on the difference in the miss ratios for a second level associative cache and the second level direct mapped cache
We can now plot the graph of the cumulative CPI in both cases, and see that there is a cross over. The distance between the points at which the crossover occurs is the maximum gain to be expected from the associativity.

#codingexercise
Decimal GetEvenNumberRangeMin(Decimal [] A)
{
if (A == null) return 0;
Return A.EvenNumberRangeMin ();
}

Thursday, December 11, 2014

#codingexercise
Decimal GetEvenNumberRangeMean(Decimal [] A)
{
if (A == null) return 0;
Return A.EvenNumberRangeMean ();
}

Today we will continue to discuss WRL system for trace generation. We were discussing long traces and its usefulness. This is not limited to multi level caches. Even for single level caches, varying cache size and degree of associativity, the effects of the trace length was studied and found to be beneficial. With the short traces, there were issues such as high miss ratios, cold start, and erroneous results or predictions depending on where the short run was chosen. The miss ratios are not always lower for long traces since not all programs behave the same and it depends on where the short run was chosen. Some programs terminate before it stabilizes, Others appear to stabilize early before they take a jump after a spell. Yet others stabilize reasonably well but they require a long run. Moreover, the cache warm up itself needs millions of references.
The WRL system does not need to save traces. To analyze a program with a different cache organization, the program only needs to be rerun with a different parameterized analysis program. This works well for single user programs that are deterministic because they generate identical traces on different runs. For multiple processes, this is no longer deterministic because they get interleaved differently on different runs. This is not bad. If we repeat the runs a few times, we can find the dominant trends. However, the variations may make it difficult to understand the significance of individual peculiarities.To overcome this limitation, the cache analysis program could be modified to simulate multiple caches at the same time. However, the variations have been within tolerable limits to not require that.
The contribution due to the size of the second level cache was also studied. The CPI was measured for three sizes of second level cache: 512K, 4M and 16M. Except for the early part of a run with TV program, there is a dramatic reduction in CPI when the cache size is increased to 4M but a considerably smaller improvement resulting from the jump to 16M. When a program actually references a large part of its address space, having a bigger size second level cache tremendously helps. However, most of the programs don't have this large working set.
We can get a bit more insight with existing data by dividing level2 cache misses into three categories - compulsory, capacity and conflict. Compulsory misses happen the first time an address is referenced, and occur regardless of the size of the cache. The capacity miss ratio is the difference between the fully associative ratio and the compulsory miss ratio. The conflict miss ratio is the difference between the direct mapped miss ratio and the associative miss ratio.
The capacity misses are due to fixed cache size. This decreases when the cache size increases but only upto a certain size.Conflict misses result from too many address mappings to the same cache lines. This increases between 512K and 4M cache size but reduces tremendously from 4M to 16M where the interfering blocks are spread out more evenly. This is where the pid hashing reduces the miss ratio for the 16M.

#codingexercise
Decimal GetEvenNumberRangeMode(Decimal [] A)
{
if (A == null) return 0;
Return A.EvenNumberRangeMode ();
}
#codingexercise

Decimal GGetEvenNumberRangSum(Decimal [] A)
{
if (A == null) return 0;
Return A.EvenNumberRangeSum ();
}

Wednesday, December 10, 2014

#codingexercise
Decimal GetOddNumberRangeMean(Decimal [] A)
{
if (A == null) return 0;
Return A.OddNumberRangeMean ();
}

Today I will continue to discuss the WRL system.
The results of the study showed that in a long run of a multi set of programs, the miss ratio of the data cache far exceeded that of the instruction cache and second level cache yet the total contribution to CPI from the data cache was far lesser than that of the other two cache. Further, the second level cache contributes twice as much to the overall degradation as do either the instruction or data cache.

Long traces provided better insight than short ones. Some of the variations of the CPI came only after a spell of execution and this is not merely the cache 'warmup' phase. When the TV program discussed earlier was executed with a long trace, the high points of the CPIdata and CPI level2 matched the lows in the graph of CPIInst. These corresponded to the loops that execute small sections of the code and go through large volumes of data. The usefulness of long traces is not limited to large multi-level caches. It could also be used for single level caches, varying cache size and degree of associativity. The study indicated that long traces are more essential for large caches and higher degrees of associativity.

Address mapping to the second level cache was also studied. Trace addresses generated for user programs were virtual addresses while that for kernel were physical address. Address translation is not problematical. If the cache size divided by its associativity is less than or equal to the page size, then the high order bits of the page offset can be used to identify the appropriate cache line. These high order bits are the same for virtual or physical addresses. Misses will be identical to those occurring in a physical cache where the pid field is replaced by a physical page number field. If the cache is much larger than the page size as in the case of second level cache, the page offset maps only to a tiny proportion of the cache.

Since the cache is large, references from different processes are spread out eliminating interprocess conflicts. Using a simple pid hashing algorithm where the pid is exclusive-ORed with part of the address that indexes into the sets of the cache, a study was performed with three sizes of second level cache. This techniques was compared to the address modulo cache size and it was found that the pid hashing reduces CPI level2 only when the cache is large.

#codingexercise
Invert a Fibonacci series given a number in the series.
This problem is unsolvable unless you are given the next number in the series or you are allowed to generate the Fibonacci series upto and inclusive of the given number.

void printDecreasingFib(int x , int y)
{
Assert ( y > = x);
print y;
print x;
while ( y - x > 0 )
{
t = y - x;
y = x;
x = t;
print y;
print x;
}
}

Tuesday, December 9, 2014

Vector form content have the advantage that they store items in layers. Consider png format with bmp. In the former we can save and remove layers where as the latter is scalar content per pixel. If this layering approach were available to most web pages and views, then we could have a css that can define style to each layer.
For example, let us say we want to build a web page for a clothing and accessories store. The web page has to let users choose the clothing and dress up a mannequin. where different clothing can go one on top of each other. This we can do today with overlaying of images in .png format.
HTML:
<img id="img1" src="img1.png"/>
<img id="img2" src="img2.png"/>
<img id="img3" src="img3.png"/>
and we define the css for these as follows:
img
{
position:absolute;
top: 0px;
left: 0px;
}
Now to cause the images to overlay, we set the z-index in css as follows:
#img1
{
z-index: 10;
}
#img2
{
z-index: 20;
}
#img3
{
z-index: 30;
}
This will overlay 3 on top of 2 on top of 1. Here 3 could be the accessory, 2 could be the mannequin and 1 could be the background.
To change the accessory, we could do :
<img src="accessory1.png" onClick="setAccessory('accessory1.png');"/>
and use javascript like so:
function setAccessory(path){
document.getElementById('layer1').src = path;
}
To add more accessories, we can just define more layers.
Adobe's macromedia dreamweaver has the notion of dynamic layers. Any object can be added to the webpage by directly adding it to a layer. Then the layer can be moved around.

#codingexercise

Decimal GetDistinctRangeMean(Decimal [] A)

{

if (A == null) return 0;

Return A.DistinctRangeMean ();

}