Cluster computing

Tuesday, March 24, 2015

Today we continue to discuss the WRL research report on Swift Java compiler. We were discussing Trace scheduling and block layout. We saw that this involves a greedy algorithm where the most frequently executed block that is not yet placed in the trace is taken as the first block in the new trace.
The trace is then extended upwards and downwards in the control flow graph. The algorithm proceeds until there are no more blocks to place in the trace. The instruction scheduler operates on one trace at a time. It adds dependence to make sure that a value is scheduled early enough to dominate all its users who are not in the trace. The scheduler also includes a model of the alpha 21164 and 21264 pipelines. This let's the scheduler decide when the result of an operation will be ready based on the latency of the operation and the state of the pipelines. Given this dependence and latency information, the scheduler decides at each point to schedule the value whose dependence have all been satisfied and whose inputs are ready or available at the earliest. If a value is chosen that is a control flow or an exception, the current basic block is ended. when a value is scheduled, the finite state automata is updated to reflect it's execution . Swift uses the profile information to determine the best layout for the trace
#codingexercise
GetAllNumberRangeProductCubeRootSquares) (Double [] A)
{
if (A == null) return 0;

Return A.AllNumberRangeProductCubeRootSquares();

}

Monday, March 23, 2015

Today we continue to discuss the WRL research report on Swift Java compiler. We were discussing machine dependent pass and optimizations including Sign extension elimination. We started mentioning about Trace scheduling and block layout. The scheduling process has three main steps: 1) decomposing the CFG into traces 2) scheduling the instructions within each trace and 3) determining a good sequence for the layout of traces. The most frequently executed block that is not yet placed in the trace is taken as the first block in the new trace. The algorithm then tries to extend the trace upwards and downwards in the CFG. If the candidate has only a single predecessor P in the CFG and P is not yet in a trace, then P is added to the trace and so on. This is done by recursion on the P's predecessor. The trace is then executed downwards by following the most frequently executed successor edge of B. Similar to the case with P, if that edge from B goes to block S and S is not yet in a trace, and S has only a single predecessor, then S is added to the trace and so on recursively with S's successor. The algorithm continues to run until all the blocks have been placed in a trace. The result is a set of traces that are extended basic blocks and that cover CFG. Next the instruction scheduler operates on a trace one at a time by building a set of all dependences that determine where the instruction can be placed.Although Swift IR may have these dependences, the scheduler adds explicit control and antidepenences. Control flow and exception causing opertions are still required to be at the end of the basic block. Special "out of trace" dependences are added to ensure that a value that is scheduled will dominate any of its users that are not in the trace.
#coding exercise

GetAlternateOddNumberRangeProductCubeRootSquares) (Double [] A)

{

if (A == null) return 0;

Return A.AlternateOddNumberRangeProductCubeRootSquares();

}

Sunday, March 22, 2015

Today we continue to discuss the WRL research report on Swift Java Compiler. We were discussing peephole optimizations. Now let us look at some machine dependent processing phases. The pattern match generator kicks off the pass that converts the IR to machine dependent code. Since its the same generator we covered in the previous post, we look at other machine dependent passes such as sign-extension elimination, instruction scheduling, register allocation and code generation.
Sign extension comes pervasive in Java code but often not always required. For example, when down casting from int to short, the lower 16 bits of the casted value are used. In this case, sign extension is not required. To determine sign-extension elimination, one technique computes how many low orders bits each input of a value needs. This is looked up through a backward walk on the SSA graph which gives us the usages. Any operation encountered whose output is equal to one of its inputs on those low-order bits can be replaced with that input. The second technique computes the state of the high order bits of each value. The state of the value includes the number of known bits and whether those bits are zero or sign extension bits. If a value were to have its hight 32 bits in a sign extended state and the input is also in a sign extended state, then any setxl is a no-op. In such cases, those redundancies can be removed.
We next discuss trace scheduling and block layout. Trace requires additional instructions that need to be interleaved with the regular execution. Swift has a full instruction scheduler for the purpose of tracing one or more blocks. The scheduling process involves the following three steps:
decomposing the control flow graph into traces
scheduling the instructions within each trace and
determining a good sequence for the layout of the traces.
#coding exercise
GetAlternateOddNumberRangeProductCubeRootFourthPower (Double [] A)
{
if (A == null) return 0;
Return A.AlternateOddNumberRangeProductCubeRootFourthPower();
}

Saturday, March 21, 2015

Today we continue to discuss the WRL research report on Swift Java compiler.
We were discussing pattern based peephole optimizer. The graph rewriting order is as follows An SSA value is processed before all of its inputs. There is an exception when there are cycles in the value graphs. For each value, rules are applied in the order specified by the rules file until one matches and mutates a value. The rewriting then proceeds to the next value. Each pass changes the value. Consequently the passes are iterated through the entire SSA graph until there are no more changes. The ordering of rules is somewhat important in ensuring that optimizations are applied in the most effective order.
Many of the machine independent peephole optimization apply simple algebraic or boolean identifies. When placing the length of a new array, it is replaced by an allocated size of an array. Similarly a get_field operation can be replaced by the value from the previous put_field operation. Other optimizations include dead code elimination by way of repeated passés, This is done via mark and sweep. The live values in a Swift SSA graph are the control values and all inputs of live values. All the unmarked values are removed as dead code. Swift also does pass for conditional constant propagation and simultaneous elimination of impossible control flow

Friday, March 20, 2015

Today we continue to discuss the WRL research report on Swift Java Compiler. We were discussing the loop peeling method of optimization. Part of the first iteration or more is peeled into code preceding the loop. It involves a search phase that starts from the highly nested loop for each loop. It evaluates the faulting operations to determine the candidates. There are a few other requirements as well. The procedure of loop peeling is straight forward. The blocks are copied to the new header. Control flow edges are redirected to the peel copy. Phi nodes are created in the new loop header or start and edges joined from the peel copy to the original. If the addition of phi nodes cascade down , the peeling is aborted. There is an alternative to estimate the number of blocks to add beforehand by running the phi node placement algorithm.
We now discuss pattern based peephole optimizer. Swift contains several passes that do pattern-based transformations of the SSA graph. A pattern matcher generator is one that takes a set of graph rewriting rules as input and produces a compiler pass which transforms the SSA graph according to those rules. Each rule consists of a pattern matcher that can match a piece of an SSA graph and some Java code which is executed when the pattern matches. The language used to describe the patterns allows matching of node operations, auxiliary operations and types, and allows nested matching on the inputs to a node. Matching of an operation, a list of operations, a particular operation, or any operations not in a list are alloed. Similar matching capabilities exist for type.Binding of a variable to a particular value in a pattern such that it can be used in the action part of the rule is permitted. The top level value is matched by the pattern is available in the action via the keyword self. Moreover the action can mutate the self value or any of its inputs.
#coding exercise
GetAlternateOddNumberRangeProductCubeRootEighthPower (Double [] A)
{
if (A == null) return 0;
Return A.AlternateOddNumberRangeProductCubeRootEighthPower();
}

#coding exercise
GetAlternateOddNumberRangeProductCubeRootSixthPower (Double [] A)
{
if (A == null) return 0;
Return A.AlternateOddNumberRangeProductCubeRootSixthPower();
}

Thursday, March 19, 2015

Today we continue to discuss the WRL research report on Swift Java Compiler. We were discussing the loop peeling method of optimization where the idea of treating initial iteration checks separately was borrowed from scientific programming. Loop peeling refers to the peeling of one or more iterations of the loop into the straight line code preceding the loop. When Swift peels the loop, Global CSE automatically eliminates the runtime checks still in the loop. The end result may look very similar to that of code motion for simple checks in loops.
The loop peeling pass begins with a search phase that looks for loops that should be peeled. The search begins outwards for all loops starting from the most highly nested loop. If the search encounters a loop faulting operation. that loop is considered a candidate for peeling. By this we mean that the operation must be a part of each loop otherwise it might very well be an exit condition time check. If the set of blocks that are dominated by the loop header and those that are dominated by the faulting operation are large then this peel of blocks is large and results in duplication of code. Therefore, there's also a threshold specified by the maximum number of blocks to peel. There also a few other requirements such as a peel should not have a nested loop.
Although the requirements sound complex, the steps are straightforward in the sense that we determine a set of blocks to peel and change the control flow to go through the peel first. When the blocks are copied from the original to the duplicates to form the peel, the values in the blocks are also copied. During this copying, the uses of the phi nodes at the top of the peel is converted to the associate phi input when entering the loop. The control flow edges entering the loop header are then redirected to the peel copy. At this time some phi nodes in the original loop header may be considered redundant and eliminated and the block that is the successor of the peel copy in the original becomes the new loop header or start.To complete the move, all the values that were copied, phi nodes are created in the new loop header and control flow edges joined from the peel copy to the peel original. If the addition of these phi nodes cascade into the addition of some more phi nodes, the loop peeling is aborted because the loop is then complex. Its also possible to estimate the number of blocks to add by re-running the phi node placement algorithm.

Wednesday, March 18, 2015

Today we continue to read the WRL research report on the Swift Java Compiler. We were discussing runtime check elimination including array store checks. We now discuss loop peeling. Loop peeling is an optimization that removes one or more of the initial iterations into straight line code preceding the loop. This helps in removing runtime checks associated with the loops. The problem that we wish to solve is that there are often loop invariant runtime checks in a loop. For example, there may be a required null check for an array that is first accessed inside the loop. The trouble with this check is that we want to move it out of the loop but it is pinned and can throw exceptions. We don't want it to throw exception if no loop is iterated. Morevoer, if the check has succeeded in one iteration, then it is safe to remove it from subsequent iterations. It is redundant since it will succeed in all later iterations. Loop peeling comes in useful to separate out the part that is the first iteration. The rest of the checks can be automatically removed by global CSE. It may be interesting to note that the loop peeling and global CSE of simple loops and checks often leave a result that is similar to that of code motion. The loop peeling pass begins with a search phase that looks for loops that should be peeled.The search includes all loops, starting with the most highly nested loops first. A loop is a candidate if it contains a loop invariant faulting operation. And it must be dominating meaning that it should be part of each iteration. The set of blocks that is dominated by the loop header and the loop invariant fault operation become the loop peel. We don't want that peel to be too large otherwise it will result in duplication of code. The largest useful peel that does not exceed a maximum number of blocks is chosen.
#coding exercise
GetAlternateOddNumberRangeProductCubeRootTenthPower (Double [] A)
{
if (A == null) return 0;
Return A.AlternateOddNumberRangeProductCubeRootTenthPower();
}