Cluster computing

Thursday, April 2, 2015

Today we continue reading the WRL research report on Swift Java compiler. We started discussing the results of the performance studies among the general results observed, method inlining, class hierarchy analysis and global CSE improved performance across all applications. This is largely due to several small functions implemented in most programs and a missing final specifier and because CHA facilitates method resolution and inlining and escape analysis.
Specific results were also studied. Field analysis had a large impact on some programs such as mpeg and compress because it eliminates many null checks and bound checks on some constant sized buffers and computation arrays. It plays an even more important role in programs such as mtrt where there are many refererences to neighbouring elements. In db, Swift successfully inlines a vector object contained within a database entry. Object inlining also inlines some of the smaller arrays used by mtrt program to help eliminate check. The performance of db improves because a significant comparison routine repeatedly generates and uses an enumeration object that can be stack allocated.
Method splitting has been found very useful in programs like db because it makes heavy use of the elementAt operation and file read operations where there is an ensureOpen check on the file handle. It was also noted that stack allocation does not always improve performance because of the size and effectiveness of the JVM heap. It would show better results if the heap size were small
Synchronization optimization operations also showed little or no performance improvements and only showed significance in the case of a program which involved synchronization of array and hash data structures.

Wednesday, April 1, 2015

Today we continue reading the WRL research report on Swift Java compiler. We started discussing the results of the study on Swift Java compiler.we were comparing general results across a variety of programs. We noted that swift was introduced into a fast JVM for the study.the compiler could compile 2000 lines of code per second. However it became slower by 20 to 40 % when escape analysis was turned on. This is expected because escape analysis requires several recursion and passes and has a cascading effect. We now look at the performance of the application when one or more of several optimizations are disabled. These optimizations include method inlining, class hierarchy analysis, Field analysis, object inlining, method splitting, stack allocation, synchronization removal, Store resolution, global CSE, global code motion, loop peeling, runtime check elimination, sign extension elimination and branch removal. If any of these terms sound unfamiliar at this time it's probably better to revisit the previous posts. Also these features are not really independent and yet we are merely interested in the gain from each of these features. All programs listed for comparison earlier were now repeated by turning one of the features off and then assigned a positive value in terms of the slowness introduced . If there was no slowness the value was left blank. Since the features are not mutually independent, there is no cumulative across all these metrics. The numbers are also merely rough indicators because performance can change when disabling one or more code optimizations. It was observed that method inlining class hierarchy analysis and global CSE improved almost all the programs without fail and more so when program code involved many small methods and virtual methods at that.

Tuesday, March 31, 2015

Today we continue reading the WRL research report on Swift Java compiler. We were discussing register allocations and solving it by means of graph coloring. We will discuss the results of the study on Swift Java compiler next. The Swift java compiler was measured on a Alpha workstation which had one 667MHz processor and a 64KB on-chip data cache and a 4MB board cache. The generated code was installed into a high performance JVM. This was necessary so that the results be properly evaluated against the controlled conditions. Only when the baseline is performant, can we find the results to be representative of the variables we control. A poor choice for baseline may hide gains from some of the variables or skew the results because of running time variations. In this case, the JVM chosen was already performing some form of CHA. This helps us evaluate the gains from the passes more appropriately. The heap size used was 100 MB. Although the hardware seems less powerful as compared to recent processors, the configuration was decent at that time. Moreover, with the JVM baseline established, the trends could be expected to be the same on a different choice of system. The tests were performed on a number of applications from a variety of domains with varying lengths in program size. The initial set of results were taken with all optimizations. Then they were taken without the class hierarchy analysis (CHA). This showed that the use of CHA greatly improves the overall performance. The overall speedup of the Swift generated code without CHA over the fast JVM is marginal because the JVM is already using some form of CHA to resolve method calls. The results were also compared for simple-CHA versus full CHA and it turned out that that the former was only somewhat less performant than the latter indicating it as a useful strategy when dynamic loading is present.
Swift Compilation could proceed at the rate of about 2000 lines of code per second with all optimizations except when escape analysis was on. Escape analysis may require slowed down the compilation by about 20-40%;

Monday, March 30, 2015

Today we continue reading the WRL research report on Swift Java compiler. We were discussing register allocations and solving it by means of graph coloring. To summarize, the steps involved were
Insert copies
Precolor
construct the bias graph
construct the interference graph
compute coloring order
color values
If some values failed to be colored
- spill uncolored values to the stack
- repeat by constructing the interference graph
Cleanup

We saw how each of this steps mattered in solving the register allocations. Specifically how the copies help when a value can be in more than one register. We saw how pre color helps with register allocations of method parameters and return values. The bias graph helps with establishing edges between values that need to be colored the same. The interference graph helps with finding edges between nodes which cannot be colored the same. In doing so, it encapsulates all the possible coloring assignments to the values. We saw how to apply a coloring heuristic where the hard nodes are colored first and the easy nodes last. The difficulty was translated to the degree of the nodes in the interference graph. The modes are then colored in the order computed. The bias graph is used to make intelligent choice of a color from the set of legal colorings allowed by the interference graph. If the coloring does not succeed we spill the values by inserting a spill value just after its definition and a restore value before each use. This lets the next pass to find it easier to color this node. Finally when the coloring has succeeded, data flow is used to eliminate unnecessary copies.
We next look at code generation. Swift's code generation pass translates SSA operation into machine code. Then the operations remaining in the SSA graph at this time correspond to zero or one alpha instructions. The code generation involves computing the stack frame size, emitting the prolog code, emitting code for each block as per the scheduling pass, emitting a branch when the successor is not the immediately following block, emitting the epilog code and emitting auxiliary information including a list of relocation entries, associated constants, an exception table, and a byte code map. Branches that are necessary are found and the final code block for that branch is determined.

Sunday, March 29, 2015

Today we continue reading the WRL research report on Swift Java compiler. We were discussing register allocations and solving it by means of graph coloring. Today we continue with the order of coloring. The bias graph is used to make intelligent choices of a color from the set of legal colorings allowed by the interference graph. Uncolored nodes are colored the same as a node only if the Interim nodes can be colored the same. If the coloring does not succeed, then we spill values to the stack. The value corresponding to each node that was not colored is spilled onto the stack by inserting a spill value just after its definition and a restore value before each use.This lets the original value and the newly added restore value to be in a register over a shorter range and thus will be hopefully easier to color on the next pass.
A final cleanup pass is necessary after all the coloring succeeds to remove copies that have the same source and destination and to remove unnecessary restore operations. This pass does a data flow computation to determine what value each register holds after each instruction. This helps with optimization such as replacing input value of each instruction with the oldest copy that is still in a register.
#codingexercise
GetAllNumberRangeProductCubeRootPowerSeven (Double [] A)
{
if (A == null) return 0;
Return A.AllNumberRangeProductCubeRootPowerSeven();
}
#codingexercise
GetAllNumberRangeProductCubeRootPowerNine) (Double [] A)
{
if (A == null) return 0;
Return A.AllNumberRangeProductCubeRootPowerNine();
}

Saturday, March 28, 2015

Today we continue our study of the WRL Research report on Swift Java compiler. We were discussing register allocations. We mentioned the construction of bias graph and interference graph. Today we discuss the next steps which is the coloring order. We saw that the algorithm proceeds by coloring the hard nodes first and the easy nodes last. The nodes with the minimum degree from the interference graph are selected first. Each time we build the interference graph, this will change so we look for the minimum remaining degree and then the order of coloring is the reverse of this order.
To color all the nodes in the order computed, we color them one by one by finding the set of possible colorings for that node. The colors of the adjacent nodes in the interference graph are then excluded from the set of possible colorings. Any color from this set is valid and if there is no color possible, then the uncolored values are spilled on the stack and the interference graph and coloring order are recomputed.
The bias graph is used to make an intelligent choice of a color from the set of legal colorings. If we represent the edges from the interference graph with solid lines and those from the bias graph with dotted lines, then to color a particular node, we do a breadth first search of the bias graph. If we find a node that is already colored, we color the original node the same color as long as that color is allowed for interim nodes. The interim node cannot be colored different if we are to use the same color for this node and the colored node. If none of the nodes found have a color that can be used for the node we want to color, then we do another BFS on the uncolored nodes in the bias graph. At each node encountered, we intersect the set of possible colors for the node we want to color, with the set of colors allowed for the encountered uncolored node. If we are left with a non-empty set, a color is chosen for the node we want to color. This method allows for the maximum number of nodes in the bias graph connected to the node we want to color to match the color we picked.

Friday, March 27, 2015

Today we continue our study of the WRL Research report on Swift Java compiler. We were discussing register allocations. We saw the first step in this algorithm was to insert copies and the second step was to pre color them. we now discuss the remaining steps. Next we construct the bias graph. This is an undirected graph that has values as nodes a edges between nodes which we want to color with the same color. The nodes hat we want to color the same are the inputs and outputs of a copy. This therefore eliminates some of the copy insertions from step 1. Next we construct the interference graph. The interference graph has nodes and edges between nodes that cannot be assigned the same color because their live ranges overlap. This is the step where we determine all the possible valid assignments of colors to values. Hence with this step, we covert the problem to a graph coloring problem. Graph coloring attempts to color the nodes such that no two nodes that are adjacent in the interference graph have the same color. The interference graph completely encodes the possible legal assignments to colors because all the restrictions are drawn. That said, the graph coloring algorithm may be NP-hard, so heuristics are involved.
In the next step, we find the coloring order of all the nodes. A coloring order is selected such that we find the most connected nodes from the interference graph and color them first. This is referred to as coloring the hard nodes first and then the easy nodes. The difficulty corresponds to the degree of the nodes in the interference graph. The algorithm proceeds by repeatedly removing a node with the minimum degree from the interference graph. On the removal of a node, the corresponding edges are also deleted. The algorithm terminates when all the nodes have been removed. The degree of the nodes changes with the removal of edges. Hence, the algorithm selects nodes with the smallest remaining degree among all the nodes. Morevoer, the order of coloring is the reverse order of the removal of nodes. This ensures that the nodes with low degree are colored after the nodes with the higher degree.
The coloring of each of the nodes in the order computed is a separate step. Here we enumerate all the possible legal colorings of that node. This could be for example all the registers that could hold that value and not including colors of any neighboring colored nodes in the original interference graph. If a node cannot be colored, it is put on the stack and the interference graph is reconstructed The algorithm exist when there are no more values left to be colored.