Cluster computing

Saturday, March 28, 2015

Today we continue our study of the WRL Research report on Swift Java compiler. We were discussing register allocations. We mentioned the construction of bias graph and interference graph. Today we discuss the next steps which is the coloring order. We saw that the algorithm proceeds by coloring the hard nodes first and the easy nodes last. The nodes with the minimum degree from the interference graph are selected first. Each time we build the interference graph, this will change so we look for the minimum remaining degree and then the order of coloring is the reverse of this order.
To color all the nodes in the order computed, we color them one by one by finding the set of possible colorings for that node. The colors of the adjacent nodes in the interference graph are then excluded from the set of possible colorings. Any color from this set is valid and if there is no color possible, then the uncolored values are spilled on the stack and the interference graph and coloring order are recomputed.
The bias graph is used to make an intelligent choice of a color from the set of legal colorings. If we represent the edges from the interference graph with solid lines and those from the bias graph with dotted lines, then to color a particular node, we do a breadth first search of the bias graph. If we find a node that is already colored, we color the original node the same color as long as that color is allowed for interim nodes. The interim node cannot be colored different if we are to use the same color for this node and the colored node. If none of the nodes found have a color that can be used for the node we want to color, then we do another BFS on the uncolored nodes in the bias graph. At each node encountered, we intersect the set of possible colors for the node we want to color, with the set of colors allowed for the encountered uncolored node. If we are left with a non-empty set, a color is chosen for the node we want to color. This method allows for the maximum number of nodes in the bias graph connected to the node we want to color to match the color we picked.

Friday, March 27, 2015

Today we continue our study of the WRL Research report on Swift Java compiler. We were discussing register allocations. We saw the first step in this algorithm was to insert copies and the second step was to pre color them. we now discuss the remaining steps. Next we construct the bias graph. This is an undirected graph that has values as nodes a edges between nodes which we want to color with the same color. The nodes hat we want to color the same are the inputs and outputs of a copy. This therefore eliminates some of the copy insertions from step 1. Next we construct the interference graph. The interference graph has nodes and edges between nodes that cannot be assigned the same color because their live ranges overlap. This is the step where we determine all the possible valid assignments of colors to values. Hence with this step, we covert the problem to a graph coloring problem. Graph coloring attempts to color the nodes such that no two nodes that are adjacent in the interference graph have the same color. The interference graph completely encodes the possible legal assignments to colors because all the restrictions are drawn. That said, the graph coloring algorithm may be NP-hard, so heuristics are involved.
In the next step, we find the coloring order of all the nodes. A coloring order is selected such that we find the most connected nodes from the interference graph and color them first. This is referred to as coloring the hard nodes first and then the easy nodes. The difficulty corresponds to the degree of the nodes in the interference graph. The algorithm proceeds by repeatedly removing a node with the minimum degree from the interference graph. On the removal of a node, the corresponding edges are also deleted. The algorithm terminates when all the nodes have been removed. The degree of the nodes changes with the removal of edges. Hence, the algorithm selects nodes with the smallest remaining degree among all the nodes. Morevoer, the order of coloring is the reverse order of the removal of nodes. This ensures that the nodes with low degree are colored after the nodes with the higher degree.
The coloring of each of the nodes in the order computed is a separate step. Here we enumerate all the possible legal colorings of that node. This could be for example all the registers that could hold that value and not including colors of any neighboring colored nodes in the original interference graph. If a node cannot be colored, it is put on the stack and the interference graph is reconstructed The algorithm exist when there are no more values left to be colored.

Thursday, March 26, 2015

Today we continue our study of the WRL Research report on Swift Java compiler. We were discussing trace scheduling and trace layout algorithms. Both of them are greedy algorithms. We next discuss register allocations. This is a modified Briggs style coloring allocator. Swift's allocator adds a special data structure instead of using the coalescing. This is the bias graph data structure to direct coloring and limit the number of copies introduced. Register allocation proceeds by assigning each value a color which represents a particular register assignment. Thereafter the problem is converted to a graph coloring problem for which there are coloring heuristics available. The coloring problem is defined by the restrictions introduced.
Register allocation proceeds with the following algorithm:
1) Insert Copies
2) Precolor
3) Construct bias graph
4) Construct interference graph
5_ Compute coloring order
6) Color values
7) If some values failed to be colored
a) Spill uncolored values to the stack
b) Goto step 4)
8) Clean up
We will discuss each of these steps. A coloring allocator assumes that each value is allocated to exactly one register for its life time. Copies are inserted when a value is required to be in more than one register such as when a value needs to move from one register to another as in the return value or a method call parameter. Copies are also required for a phi node especially because the input values to a phi node may not be assigned to the same register as the phi node. In addition, Swift uses LIMIT algorithm to split the live ranges of values around loops in which they are not referenced. It means that the live range of a value is split into copies before and after the loop. This helps with the fallout of values after those used within the loops.
The next phase is value precoloring. This is kind of the initialization step. The compiler determines which values need to be assigned to certain registers and fixes their color assignment. Values which have fixed register assignments include method arguments and return values
The next stage involves creating the bias graph. This is the data structure introduced by Swift and it is an undirected graph that has values as nodes and edges between nodes which are to be colored the same. This data structure was introduced to undo as many copy operations from the first step so as to try color the input and output of a copy the same color.
#codingexercise
GetAllNumberRangeProductCubeRootPowerSix) (Double [] A)
{
if (A == null) return 0;
Return A.AllNumberRangeProductCubeRootPowerSix();
}

Wednesday, March 25, 2015

Today we continue to discuss the WRL research report on Swift Java compiler. We were discussing Trace scheduling and we saw that it involves a greedy algorithm. We will now continue with block layout how Swift uses profile information to determine a good layout for the traces.Swift uses a simple version of Pettis and Hansen's code layout algorithm. This is also a greedy algorithm that gradually merges blocks/traces into sequences. and always merges the two sequences that have the heaviest weight edge between an element of one and the element of the other. The end result of this algorithm is a single merged sequence which is the desired layout.
The changes that Swift makes ensures that loop exit block will be placed at the end of a loop/ Swift also modifies the dynamic or static profile information such as by reducing the weight of other outgoing edges of a block which has an edge that exits a loop. By keeping the exit at the end, Swift guarantees only one branch per loop iteration. Branches that exit in the middle are given lower priority since these edges are already determined to be less important than the remaining edges in a tree.
#codingexercise
GetAllNumberRangeProductCubeRootPowerFour) (Double [] A)
{
if (A == null) return 0;

Return A.AllNumberRangeProductCubeRootPowerFour();

}

Tuesday, March 24, 2015

Today we continue to discuss the WRL research report on Swift Java compiler. We were discussing Trace scheduling and block layout. We saw that this involves a greedy algorithm where the most frequently executed block that is not yet placed in the trace is taken as the first block in the new trace.
The trace is then extended upwards and downwards in the control flow graph. The algorithm proceeds until there are no more blocks to place in the trace. The instruction scheduler operates on one trace at a time. It adds dependence to make sure that a value is scheduled early enough to dominate all its users who are not in the trace. The scheduler also includes a model of the alpha 21164 and 21264 pipelines. This let's the scheduler decide when the result of an operation will be ready based on the latency of the operation and the state of the pipelines. Given this dependence and latency information, the scheduler decides at each point to schedule the value whose dependence have all been satisfied and whose inputs are ready or available at the earliest. If a value is chosen that is a control flow or an exception, the current basic block is ended. when a value is scheduled, the finite state automata is updated to reflect it's execution . Swift uses the profile information to determine the best layout for the trace
#codingexercise
GetAllNumberRangeProductCubeRootSquares) (Double [] A)
{
if (A == null) return 0;

Return A.AllNumberRangeProductCubeRootSquares();

}

Monday, March 23, 2015

Today we continue to discuss the WRL research report on Swift Java compiler. We were discussing machine dependent pass and optimizations including Sign extension elimination. We started mentioning about Trace scheduling and block layout. The scheduling process has three main steps: 1) decomposing the CFG into traces 2) scheduling the instructions within each trace and 3) determining a good sequence for the layout of traces. The most frequently executed block that is not yet placed in the trace is taken as the first block in the new trace. The algorithm then tries to extend the trace upwards and downwards in the CFG. If the candidate has only a single predecessor P in the CFG and P is not yet in a trace, then P is added to the trace and so on. This is done by recursion on the P's predecessor. The trace is then executed downwards by following the most frequently executed successor edge of B. Similar to the case with P, if that edge from B goes to block S and S is not yet in a trace, and S has only a single predecessor, then S is added to the trace and so on recursively with S's successor. The algorithm continues to run until all the blocks have been placed in a trace. The result is a set of traces that are extended basic blocks and that cover CFG. Next the instruction scheduler operates on a trace one at a time by building a set of all dependences that determine where the instruction can be placed.Although Swift IR may have these dependences, the scheduler adds explicit control and antidepenences. Control flow and exception causing opertions are still required to be at the end of the basic block. Special "out of trace" dependences are added to ensure that a value that is scheduled will dominate any of its users that are not in the trace.
#coding exercise

GetAlternateOddNumberRangeProductCubeRootSquares) (Double [] A)

{

if (A == null) return 0;

Return A.AlternateOddNumberRangeProductCubeRootSquares();

}

Sunday, March 22, 2015

Today we continue to discuss the WRL research report on Swift Java Compiler. We were discussing peephole optimizations. Now let us look at some machine dependent processing phases. The pattern match generator kicks off the pass that converts the IR to machine dependent code. Since its the same generator we covered in the previous post, we look at other machine dependent passes such as sign-extension elimination, instruction scheduling, register allocation and code generation.
Sign extension comes pervasive in Java code but often not always required. For example, when down casting from int to short, the lower 16 bits of the casted value are used. In this case, sign extension is not required. To determine sign-extension elimination, one technique computes how many low orders bits each input of a value needs. This is looked up through a backward walk on the SSA graph which gives us the usages. Any operation encountered whose output is equal to one of its inputs on those low-order bits can be replaced with that input. The second technique computes the state of the high order bits of each value. The state of the value includes the number of known bits and whether those bits are zero or sign extension bits. If a value were to have its hight 32 bits in a sign extended state and the input is also in a sign extended state, then any setxl is a no-op. In such cases, those redundancies can be removed.
We next discuss trace scheduling and block layout. Trace requires additional instructions that need to be interleaved with the regular execution. Swift has a full instruction scheduler for the purpose of tracing one or more blocks. The scheduling process involves the following three steps:
decomposing the control flow graph into traces
scheduling the instructions within each trace and
determining a good sequence for the layout of the traces.
#coding exercise
GetAlternateOddNumberRangeProductCubeRootFourthPower (Double [] A)
{
if (A == null) return 0;
Return A.AlternateOddNumberRangeProductCubeRootFourthPower();
}