Cluster computing: March 2015

Tuesday, March 31, 2015

Today we continue reading the WRL research report on Swift Java compiler. We were discussing register allocations and solving it by means of graph coloring. We will discuss the results of the study on Swift Java compiler next. The Swift java compiler was measured on a Alpha workstation which had one 667MHz processor and a 64KB on-chip data cache and a 4MB board cache. The generated code was installed into a high performance JVM. This was necessary so that the results be properly evaluated against the controlled conditions. Only when the baseline is performant, can we find the results to be representative of the variables we control. A poor choice for baseline may hide gains from some of the variables or skew the results because of running time variations. In this case, the JVM chosen was already performing some form of CHA. This helps us evaluate the gains from the passes more appropriately. The heap size used was 100 MB. Although the hardware seems less powerful as compared to recent processors, the configuration was decent at that time. Moreover, with the JVM baseline established, the trends could be expected to be the same on a different choice of system. The tests were performed on a number of applications from a variety of domains with varying lengths in program size. The initial set of results were taken with all optimizations. Then they were taken without the class hierarchy analysis (CHA). This showed that the use of CHA greatly improves the overall performance. The overall speedup of the Swift generated code without CHA over the fast JVM is marginal because the JVM is already using some form of CHA to resolve method calls. The results were also compared for simple-CHA versus full CHA and it turned out that that the former was only somewhat less performant than the latter indicating it as a useful strategy when dynamic loading is present.
Swift Compilation could proceed at the rate of about 2000 lines of code per second with all optimizations except when escape analysis was on. Escape analysis may require slowed down the compilation by about 20-40%;

Monday, March 30, 2015

Today we continue reading the WRL research report on Swift Java compiler. We were discussing register allocations and solving it by means of graph coloring. To summarize, the steps involved were
Insert copies
Precolor
construct the bias graph
construct the interference graph
compute coloring order
color values
If some values failed to be colored
- spill uncolored values to the stack
- repeat by constructing the interference graph
Cleanup

We saw how each of this steps mattered in solving the register allocations. Specifically how the copies help when a value can be in more than one register. We saw how pre color helps with register allocations of method parameters and return values. The bias graph helps with establishing edges between values that need to be colored the same. The interference graph helps with finding edges between nodes which cannot be colored the same. In doing so, it encapsulates all the possible coloring assignments to the values. We saw how to apply a coloring heuristic where the hard nodes are colored first and the easy nodes last. The difficulty was translated to the degree of the nodes in the interference graph. The modes are then colored in the order computed. The bias graph is used to make intelligent choice of a color from the set of legal colorings allowed by the interference graph. If the coloring does not succeed we spill the values by inserting a spill value just after its definition and a restore value before each use. This lets the next pass to find it easier to color this node. Finally when the coloring has succeeded, data flow is used to eliminate unnecessary copies.
We next look at code generation. Swift's code generation pass translates SSA operation into machine code. Then the operations remaining in the SSA graph at this time correspond to zero or one alpha instructions. The code generation involves computing the stack frame size, emitting the prolog code, emitting code for each block as per the scheduling pass, emitting a branch when the successor is not the immediately following block, emitting the epilog code and emitting auxiliary information including a list of relocation entries, associated constants, an exception table, and a byte code map. Branches that are necessary are found and the final code block for that branch is determined.

Sunday, March 29, 2015

Today we continue reading the WRL research report on Swift Java compiler. We were discussing register allocations and solving it by means of graph coloring. Today we continue with the order of coloring. The bias graph is used to make intelligent choices of a color from the set of legal colorings allowed by the interference graph. Uncolored nodes are colored the same as a node only if the Interim nodes can be colored the same. If the coloring does not succeed, then we spill values to the stack. The value corresponding to each node that was not colored is spilled onto the stack by inserting a spill value just after its definition and a restore value before each use.This lets the original value and the newly added restore value to be in a register over a shorter range and thus will be hopefully easier to color on the next pass.
A final cleanup pass is necessary after all the coloring succeeds to remove copies that have the same source and destination and to remove unnecessary restore operations. This pass does a data flow computation to determine what value each register holds after each instruction. This helps with optimization such as replacing input value of each instruction with the oldest copy that is still in a register.
#codingexercise
GetAllNumberRangeProductCubeRootPowerSeven (Double [] A)
{
if (A == null) return 0;
Return A.AllNumberRangeProductCubeRootPowerSeven();
}
#codingexercise
GetAllNumberRangeProductCubeRootPowerNine) (Double [] A)
{
if (A == null) return 0;
Return A.AllNumberRangeProductCubeRootPowerNine();
}

Saturday, March 28, 2015

Today we continue our study of the WRL Research report on Swift Java compiler. We were discussing register allocations. We mentioned the construction of bias graph and interference graph. Today we discuss the next steps which is the coloring order. We saw that the algorithm proceeds by coloring the hard nodes first and the easy nodes last. The nodes with the minimum degree from the interference graph are selected first. Each time we build the interference graph, this will change so we look for the minimum remaining degree and then the order of coloring is the reverse of this order.
To color all the nodes in the order computed, we color them one by one by finding the set of possible colorings for that node. The colors of the adjacent nodes in the interference graph are then excluded from the set of possible colorings. Any color from this set is valid and if there is no color possible, then the uncolored values are spilled on the stack and the interference graph and coloring order are recomputed.
The bias graph is used to make an intelligent choice of a color from the set of legal colorings. If we represent the edges from the interference graph with solid lines and those from the bias graph with dotted lines, then to color a particular node, we do a breadth first search of the bias graph. If we find a node that is already colored, we color the original node the same color as long as that color is allowed for interim nodes. The interim node cannot be colored different if we are to use the same color for this node and the colored node. If none of the nodes found have a color that can be used for the node we want to color, then we do another BFS on the uncolored nodes in the bias graph. At each node encountered, we intersect the set of possible colors for the node we want to color, with the set of colors allowed for the encountered uncolored node. If we are left with a non-empty set, a color is chosen for the node we want to color. This method allows for the maximum number of nodes in the bias graph connected to the node we want to color to match the color we picked.

Friday, March 27, 2015

Today we continue our study of the WRL Research report on Swift Java compiler. We were discussing register allocations. We saw the first step in this algorithm was to insert copies and the second step was to pre color them. we now discuss the remaining steps. Next we construct the bias graph. This is an undirected graph that has values as nodes a edges between nodes which we want to color with the same color. The nodes hat we want to color the same are the inputs and outputs of a copy. This therefore eliminates some of the copy insertions from step 1. Next we construct the interference graph. The interference graph has nodes and edges between nodes that cannot be assigned the same color because their live ranges overlap. This is the step where we determine all the possible valid assignments of colors to values. Hence with this step, we covert the problem to a graph coloring problem. Graph coloring attempts to color the nodes such that no two nodes that are adjacent in the interference graph have the same color. The interference graph completely encodes the possible legal assignments to colors because all the restrictions are drawn. That said, the graph coloring algorithm may be NP-hard, so heuristics are involved.
In the next step, we find the coloring order of all the nodes. A coloring order is selected such that we find the most connected nodes from the interference graph and color them first. This is referred to as coloring the hard nodes first and then the easy nodes. The difficulty corresponds to the degree of the nodes in the interference graph. The algorithm proceeds by repeatedly removing a node with the minimum degree from the interference graph. On the removal of a node, the corresponding edges are also deleted. The algorithm terminates when all the nodes have been removed. The degree of the nodes changes with the removal of edges. Hence, the algorithm selects nodes with the smallest remaining degree among all the nodes. Morevoer, the order of coloring is the reverse order of the removal of nodes. This ensures that the nodes with low degree are colored after the nodes with the higher degree.
The coloring of each of the nodes in the order computed is a separate step. Here we enumerate all the possible legal colorings of that node. This could be for example all the registers that could hold that value and not including colors of any neighboring colored nodes in the original interference graph. If a node cannot be colored, it is put on the stack and the interference graph is reconstructed The algorithm exist when there are no more values left to be colored.

Thursday, March 26, 2015

Today we continue our study of the WRL Research report on Swift Java compiler. We were discussing trace scheduling and trace layout algorithms. Both of them are greedy algorithms. We next discuss register allocations. This is a modified Briggs style coloring allocator. Swift's allocator adds a special data structure instead of using the coalescing. This is the bias graph data structure to direct coloring and limit the number of copies introduced. Register allocation proceeds by assigning each value a color which represents a particular register assignment. Thereafter the problem is converted to a graph coloring problem for which there are coloring heuristics available. The coloring problem is defined by the restrictions introduced.
Register allocation proceeds with the following algorithm:
1) Insert Copies
2) Precolor
3) Construct bias graph
4) Construct interference graph
5_ Compute coloring order
6) Color values
7) If some values failed to be colored
a) Spill uncolored values to the stack
b) Goto step 4)
8) Clean up
We will discuss each of these steps. A coloring allocator assumes that each value is allocated to exactly one register for its life time. Copies are inserted when a value is required to be in more than one register such as when a value needs to move from one register to another as in the return value or a method call parameter. Copies are also required for a phi node especially because the input values to a phi node may not be assigned to the same register as the phi node. In addition, Swift uses LIMIT algorithm to split the live ranges of values around loops in which they are not referenced. It means that the live range of a value is split into copies before and after the loop. This helps with the fallout of values after those used within the loops.
The next phase is value precoloring. This is kind of the initialization step. The compiler determines which values need to be assigned to certain registers and fixes their color assignment. Values which have fixed register assignments include method arguments and return values
The next stage involves creating the bias graph. This is the data structure introduced by Swift and it is an undirected graph that has values as nodes and edges between nodes which are to be colored the same. This data structure was introduced to undo as many copy operations from the first step so as to try color the input and output of a copy the same color.
#codingexercise
GetAllNumberRangeProductCubeRootPowerSix) (Double [] A)
{
if (A == null) return 0;
Return A.AllNumberRangeProductCubeRootPowerSix();
}

Wednesday, March 25, 2015

Today we continue to discuss the WRL research report on Swift Java compiler. We were discussing Trace scheduling and we saw that it involves a greedy algorithm. We will now continue with block layout how Swift uses profile information to determine a good layout for the traces.Swift uses a simple version of Pettis and Hansen's code layout algorithm. This is also a greedy algorithm that gradually merges blocks/traces into sequences. and always merges the two sequences that have the heaviest weight edge between an element of one and the element of the other. The end result of this algorithm is a single merged sequence which is the desired layout.
The changes that Swift makes ensures that loop exit block will be placed at the end of a loop/ Swift also modifies the dynamic or static profile information such as by reducing the weight of other outgoing edges of a block which has an edge that exits a loop. By keeping the exit at the end, Swift guarantees only one branch per loop iteration. Branches that exit in the middle are given lower priority since these edges are already determined to be less important than the remaining edges in a tree.
#codingexercise
GetAllNumberRangeProductCubeRootPowerFour) (Double [] A)
{
if (A == null) return 0;

Return A.AllNumberRangeProductCubeRootPowerFour();

}

Tuesday, March 24, 2015

Today we continue to discuss the WRL research report on Swift Java compiler. We were discussing Trace scheduling and block layout. We saw that this involves a greedy algorithm where the most frequently executed block that is not yet placed in the trace is taken as the first block in the new trace.
The trace is then extended upwards and downwards in the control flow graph. The algorithm proceeds until there are no more blocks to place in the trace. The instruction scheduler operates on one trace at a time. It adds dependence to make sure that a value is scheduled early enough to dominate all its users who are not in the trace. The scheduler also includes a model of the alpha 21164 and 21264 pipelines. This let's the scheduler decide when the result of an operation will be ready based on the latency of the operation and the state of the pipelines. Given this dependence and latency information, the scheduler decides at each point to schedule the value whose dependence have all been satisfied and whose inputs are ready or available at the earliest. If a value is chosen that is a control flow or an exception, the current basic block is ended. when a value is scheduled, the finite state automata is updated to reflect it's execution . Swift uses the profile information to determine the best layout for the trace
#codingexercise
GetAllNumberRangeProductCubeRootSquares) (Double [] A)
{
if (A == null) return 0;

Return A.AllNumberRangeProductCubeRootSquares();

}

Monday, March 23, 2015

Today we continue to discuss the WRL research report on Swift Java compiler. We were discussing machine dependent pass and optimizations including Sign extension elimination. We started mentioning about Trace scheduling and block layout. The scheduling process has three main steps: 1) decomposing the CFG into traces 2) scheduling the instructions within each trace and 3) determining a good sequence for the layout of traces. The most frequently executed block that is not yet placed in the trace is taken as the first block in the new trace. The algorithm then tries to extend the trace upwards and downwards in the CFG. If the candidate has only a single predecessor P in the CFG and P is not yet in a trace, then P is added to the trace and so on. This is done by recursion on the P's predecessor. The trace is then executed downwards by following the most frequently executed successor edge of B. Similar to the case with P, if that edge from B goes to block S and S is not yet in a trace, and S has only a single predecessor, then S is added to the trace and so on recursively with S's successor. The algorithm continues to run until all the blocks have been placed in a trace. The result is a set of traces that are extended basic blocks and that cover CFG. Next the instruction scheduler operates on a trace one at a time by building a set of all dependences that determine where the instruction can be placed.Although Swift IR may have these dependences, the scheduler adds explicit control and antidepenences. Control flow and exception causing opertions are still required to be at the end of the basic block. Special "out of trace" dependences are added to ensure that a value that is scheduled will dominate any of its users that are not in the trace.
#coding exercise

GetAlternateOddNumberRangeProductCubeRootSquares) (Double [] A)

{

if (A == null) return 0;

Return A.AlternateOddNumberRangeProductCubeRootSquares();

}

Sunday, March 22, 2015

Today we continue to discuss the WRL research report on Swift Java Compiler. We were discussing peephole optimizations. Now let us look at some machine dependent processing phases. The pattern match generator kicks off the pass that converts the IR to machine dependent code. Since its the same generator we covered in the previous post, we look at other machine dependent passes such as sign-extension elimination, instruction scheduling, register allocation and code generation.
Sign extension comes pervasive in Java code but often not always required. For example, when down casting from int to short, the lower 16 bits of the casted value are used. In this case, sign extension is not required. To determine sign-extension elimination, one technique computes how many low orders bits each input of a value needs. This is looked up through a backward walk on the SSA graph which gives us the usages. Any operation encountered whose output is equal to one of its inputs on those low-order bits can be replaced with that input. The second technique computes the state of the high order bits of each value. The state of the value includes the number of known bits and whether those bits are zero or sign extension bits. If a value were to have its hight 32 bits in a sign extended state and the input is also in a sign extended state, then any setxl is a no-op. In such cases, those redundancies can be removed.
We next discuss trace scheduling and block layout. Trace requires additional instructions that need to be interleaved with the regular execution. Swift has a full instruction scheduler for the purpose of tracing one or more blocks. The scheduling process involves the following three steps:
decomposing the control flow graph into traces
scheduling the instructions within each trace and
determining a good sequence for the layout of the traces.
#coding exercise
GetAlternateOddNumberRangeProductCubeRootFourthPower (Double [] A)
{
if (A == null) return 0;
Return A.AlternateOddNumberRangeProductCubeRootFourthPower();
}

Saturday, March 21, 2015

Today we continue to discuss the WRL research report on Swift Java compiler.
We were discussing pattern based peephole optimizer. The graph rewriting order is as follows An SSA value is processed before all of its inputs. There is an exception when there are cycles in the value graphs. For each value, rules are applied in the order specified by the rules file until one matches and mutates a value. The rewriting then proceeds to the next value. Each pass changes the value. Consequently the passes are iterated through the entire SSA graph until there are no more changes. The ordering of rules is somewhat important in ensuring that optimizations are applied in the most effective order.
Many of the machine independent peephole optimization apply simple algebraic or boolean identifies. When placing the length of a new array, it is replaced by an allocated size of an array. Similarly a get_field operation can be replaced by the value from the previous put_field operation. Other optimizations include dead code elimination by way of repeated passés, This is done via mark and sweep. The live values in a Swift SSA graph are the control values and all inputs of live values. All the unmarked values are removed as dead code. Swift also does pass for conditional constant propagation and simultaneous elimination of impossible control flow

Friday, March 20, 2015

Today we continue to discuss the WRL research report on Swift Java Compiler. We were discussing the loop peeling method of optimization. Part of the first iteration or more is peeled into code preceding the loop. It involves a search phase that starts from the highly nested loop for each loop. It evaluates the faulting operations to determine the candidates. There are a few other requirements as well. The procedure of loop peeling is straight forward. The blocks are copied to the new header. Control flow edges are redirected to the peel copy. Phi nodes are created in the new loop header or start and edges joined from the peel copy to the original. If the addition of phi nodes cascade down , the peeling is aborted. There is an alternative to estimate the number of blocks to add beforehand by running the phi node placement algorithm.
We now discuss pattern based peephole optimizer. Swift contains several passes that do pattern-based transformations of the SSA graph. A pattern matcher generator is one that takes a set of graph rewriting rules as input and produces a compiler pass which transforms the SSA graph according to those rules. Each rule consists of a pattern matcher that can match a piece of an SSA graph and some Java code which is executed when the pattern matches. The language used to describe the patterns allows matching of node operations, auxiliary operations and types, and allows nested matching on the inputs to a node. Matching of an operation, a list of operations, a particular operation, or any operations not in a list are alloed. Similar matching capabilities exist for type.Binding of a variable to a particular value in a pattern such that it can be used in the action part of the rule is permitted. The top level value is matched by the pattern is available in the action via the keyword self. Moreover the action can mutate the self value or any of its inputs.
#coding exercise
GetAlternateOddNumberRangeProductCubeRootEighthPower (Double [] A)
{
if (A == null) return 0;
Return A.AlternateOddNumberRangeProductCubeRootEighthPower();
}

#coding exercise
GetAlternateOddNumberRangeProductCubeRootSixthPower (Double [] A)
{
if (A == null) return 0;
Return A.AlternateOddNumberRangeProductCubeRootSixthPower();
}

Thursday, March 19, 2015

Today we continue to discuss the WRL research report on Swift Java Compiler. We were discussing the loop peeling method of optimization where the idea of treating initial iteration checks separately was borrowed from scientific programming. Loop peeling refers to the peeling of one or more iterations of the loop into the straight line code preceding the loop. When Swift peels the loop, Global CSE automatically eliminates the runtime checks still in the loop. The end result may look very similar to that of code motion for simple checks in loops.
The loop peeling pass begins with a search phase that looks for loops that should be peeled. The search begins outwards for all loops starting from the most highly nested loop. If the search encounters a loop faulting operation. that loop is considered a candidate for peeling. By this we mean that the operation must be a part of each loop otherwise it might very well be an exit condition time check. If the set of blocks that are dominated by the loop header and those that are dominated by the faulting operation are large then this peel of blocks is large and results in duplication of code. Therefore, there's also a threshold specified by the maximum number of blocks to peel. There also a few other requirements such as a peel should not have a nested loop.
Although the requirements sound complex, the steps are straightforward in the sense that we determine a set of blocks to peel and change the control flow to go through the peel first. When the blocks are copied from the original to the duplicates to form the peel, the values in the blocks are also copied. During this copying, the uses of the phi nodes at the top of the peel is converted to the associate phi input when entering the loop. The control flow edges entering the loop header are then redirected to the peel copy. At this time some phi nodes in the original loop header may be considered redundant and eliminated and the block that is the successor of the peel copy in the original becomes the new loop header or start.To complete the move, all the values that were copied, phi nodes are created in the new loop header and control flow edges joined from the peel copy to the peel original. If the addition of these phi nodes cascade into the addition of some more phi nodes, the loop peeling is aborted because the loop is then complex. Its also possible to estimate the number of blocks to add by re-running the phi node placement algorithm.

Wednesday, March 18, 2015

Today we continue to read the WRL research report on the Swift Java Compiler. We were discussing runtime check elimination including array store checks. We now discuss loop peeling. Loop peeling is an optimization that removes one or more of the initial iterations into straight line code preceding the loop. This helps in removing runtime checks associated with the loops. The problem that we wish to solve is that there are often loop invariant runtime checks in a loop. For example, there may be a required null check for an array that is first accessed inside the loop. The trouble with this check is that we want to move it out of the loop but it is pinned and can throw exceptions. We don't want it to throw exception if no loop is iterated. Morevoer, if the check has succeeded in one iteration, then it is safe to remove it from subsequent iterations. It is redundant since it will succeed in all later iterations. Loop peeling comes in useful to separate out the part that is the first iteration. The rest of the checks can be automatically removed by global CSE. It may be interesting to note that the loop peeling and global CSE of simple loops and checks often leave a result that is similar to that of code motion. The loop peeling pass begins with a search phase that looks for loops that should be peeled.The search includes all loops, starting with the most highly nested loops first. A loop is a candidate if it contains a loop invariant faulting operation. And it must be dominating meaning that it should be part of each iteration. The set of blocks that is dominated by the loop header and the loop invariant fault operation become the loop peel. We don't want that peel to be too large otherwise it will result in duplication of code. The largest useful peel that does not exceed a maximum number of blocks is chosen.
#coding exercise
GetAlternateOddNumberRangeProductCubeRootTenthPower (Double [] A)
{
if (A == null) return 0;
Return A.AlternateOddNumberRangeProductCubeRootTenthPower();
}

Tuesday, March 17, 2015

Today we continue to read the WRL research report on the Swift Java Compiler. We were discussing runtime check elimination. We said that swift does some simple determinations based on the users of a value and not based on any hypothesis or theorem. These checks or type tests can be made based on the properties of the values and the control flow of the method. We saw this for both null check and type tests and the redundancies were removed. The bounds check was a bit more involved. Swift applies condition gathering recursively down to a fixed depth, and applies transitive properties between three variables. Some simple algebraic rules such as sum of two variables are also applied. This approach is effective because Swift's IR provides intermediate access to all the users of a value v, and therefore makes it easy to find program expressions that help prove properties about v. In addition, Swift does a round of global CSE before doing the check removal which lets Swift combine values equivalent to v and collect all their uses together.
We discuss array store checks as another run time check for Swift to optimize. In Java, this check ensures that the object being stored into the array is compatible with the base type of the array. Swift attempts to eliminate array store checks via several methods. To eliminate any of these checks Swift has to resolve the base type of the array.It will also eliminate an array store check if the base type of the array is known and can hold the known type of the stored value. If the value is a null or loaded from another element of the array, those checks are eliminated too. Swift also checks the class of the base type of array has no subclasses. If the type is final, then the store check cannot fail and is eliminated.
#coding exercise
GetAlternateOddNumberRangeSumCubeRootTenthPower (Double [] A)
{
if (A == null) return 0;
Return A.AlternateOddNumberRangeSumCubeRootTenthPower();
}

Monday, March 16, 2015

Today we continue to read the WRL research report on the Swift Java Compiler. We were discussing branch removal and the two methods that Swift uses. The first method involves removing of branches for conditions that evaluate to 0 or 1 and involves removing a phi node that also removes an edge from the control flow graph The second method translates logical comparisons to bitwise comparisons. We now study runtime check elimination. Some checks or type tests can be made on the properties of values and corroborated with the control flow of a method. For example, if a statement checks if a value is non-null, then null checks can be eliminated in one branch of this if - statement. Generalizing this, Swift scans the current SSA graph for run-time checks that have not been eliminated. In the example above, if the successor block along one of the edges of the null time check has null checks, then they can be eliminated However values that depend on the null time check should not float above the successor block. To do this, Swift places a pin value and changes all users of the null-check to use pin value instead. Pin values do not generate any machine code but they are pinned to their original block. These measures guarantee that the values that depend on the null check stay within the correct branch of the IF statement. The same discussion applies for cast checks.
For example, for a cast check of value v, Swift scans the users of v for instanceof checks on v that control an IF. Similar to the null checks, redundant IFs are eliminated by searching for a dominant IF with the same controlling condition.
For a bounds check, however, Swift does somewhat more work. If Swift were to try to eliminate a bounds check whose index is value v. then Swift first scans the users of v to derive conditions on v. If it finds a comparision on v that controls an IF, then it may know that the comparision or its negation is true. On the other hand, if v is an induction variable of a loop that is monotonically increasing / decreasing, then v is guaranteed to be greater/less than its starting value. In the example of a for loop, the termination condition specifies that the index is always less than the termination value which is usually an array length. Similarly for a zero based loop, the index has to be greater than or equal to zero given that the index is monotonically increasing. Together these two conditions guarantee that the bounds check associated with accessing the array element at index i will always succeed and so Swift will swiftly remove it.
The key idea behind runtime checks is that Swift does not try to solve a system of inequalities involving i in order to prove properties of i. Instead, it looks at all the users of the value representing i, and determines if any direct comparisions of i can be shown to be true.
#coding exercise
GetAlternateOddNumberRangeSumCubeRootEigthPower (Double [] A)
{
if (A == null) return 0;
Return A.AlternateOddNumberRangeSumCubeRootEigthPower();
}

Sunday, March 15, 2015

Today we continue to read the WRL research report on the Swift Java Compiler. We saw that the Global Common SubExpression elimination and global code motion have significant optimization benefits. We now reviewed Global Code Motion. We now look at branch removal. Swift has a phase that uses two methods to attempt to remove branches. The first method removes branches that are used to compute a value that is either 0 or 1. Instead it wants to assign them directly. However in JVM there are no byte codes that produce a boolean value directly from a comparison. A branch is therefore involved however Swift tries to optimize it. It simply searches through all the phi nodes in the method whose two inputs are constants of 0 and 1. If it translates to an If-then-else, then Swift can replace the use of the phi nodes with the controlling boolean of the If node. When the phi node is optimized away, the if-then-else control flow structure is also removed. Swift also tries to remove branches by converting logical and/or expression to bitwise and/or expression. For example, the code for a conditional that has two conditions with a logical 'and', requires two branches since the second condition is not evaluated if the first one fails. However, if the conditions are fairly simple and have no side effects, then they can be combined with a bitwise AND and then do a conditional branch. This reduction in the number of branches is important because extra branches can reduce the rate of instruction fetching and may even result in a speculation down the wrong path.
This optimization is fairly easy to implement in the Swift IR. Swift simply looks through all the blocks in CFG for a block that is controlled by an IF node and one of whose successors is also an IF node. If such a block is found, Swift checks the other criteria namely no side effects. There must be no memory write operation, method call or other exception causing operations. If these are satisfied then the conditions can be combined and the branches can be eliminated.

Saturday, March 14, 2015

Today we continue to read the WRL research report on the Swift Java Compiler. We saw that the Global Common SubExpression elimination and global code motion have significant optimization benefits. We now look at Global Code Motion. The main purpose of Global Code Motion is to move loop independent values outside of loops and to ensure that all values are in blocks that dominate their users after global CSE. One such algorithm is the Click's code motion algorithm that proceeds via two passes over the SSA graph. The first pass finds the earliest possible placement of all values by requiring only that a value be placed in a position that dominates all its users. This requirement implies that the early placement will dominate the late value. After determining the latest placement for value in the second pass, the algorithm scans the sequence of blocks from the early to the late placement value. The values are then placed in the latest block. They are ordered such that the innermost loop is the lowest in the denominator tree. The result of this motion is that the value is moved out of loops as much as possible, and is otherwise put in the most control dependent block. Just before the code motion, duplicate inputs to phi nodes are eliminated. The SSA graph for a method has a phi node in the final block that merges all the possible return values. To get rid of the duplicate code sequences, the phi nodes are scanned for duplicate input. If one such is found, then the only other phi node in the block is for global stores. The duplicates are then eliminated by inserting an extra predecessor block that joins all the blocks corresponding to duplicate inputs. This results in the register copies for the duplicate inputs to be merged into a single code sequence and the duplicated value can now move down into the new block.
As an example, if we have a method that takes a single parameter,, does a few parameter checks initially and loops over the iterator of the parameter to check for a condition and returns true/false,
then per the discussion above, there is a phi node that merges all the return values. The generated code has sequences that load false as well as true and jump to the epilog. These may be redundant for their respective return value and they are eliminated by inserting the extra predecessor block.

Friday, March 13, 2015

Today we continue to read the WRL research report on the Swift Java Compiler. We saw that the Global Common SubExpression elimination and global code motion have significant optimization benefits In addition there was one modification made to the partitioning algorithm to deal with values that cause exceptions. Exception causing values cannot be moved from their original block so code motion can not be applied to here to ensure that the values dominate their use after CSE is applied. If we take the case of two identical exceptions causing values, the first value can only replace the second if the first values block dominates the second values block. Also, the second values' block is redundant only if the first block did not throw an exception. Here Swift requires that the second values block is dominated by the non-exception successor of the first values block.
These conditions are satisfied if the CSE algorithm is modified as follows. First is the rule that the dominating value of the partition of exception causing values is kept as the first element of the partition. At the end of each run, the value in each partition is dominated by the the exception causing values otherwise the partition is split by making the exception causing value as the first element of a split into those dominated by this exception causing value and otherwise. The smaller of the two partitions is added to the worklist for another run. This is repeated until all the partitions have their value dominating the rest.
Global CSE is effective in eliminating runtime checks automatically and for compressing the control flow graph. Both the CSE and Code motion is executed once after all the inter procedural analysis and during machine independent processing. Swift does another round of it after the conversion of machine-dependent IR, in order to eliminate the common sub expressions that appear in the lower level form.

Thursday, March 12, 2015

Today we continue to read the WRL research report on the Swift Java Compiler. We were discussing Global Common SubExpression elimination and global code motion. We were discussing how CSE is applied whenever there are two values that are equivalent and particularly even if neither of the two values are dominant. This is determined based on a work list. The values are all split into identical initial partitions based solely on their operation fields. For each of these partitions a set of values are build that takes one of the values in the partition. The partition is then split and the smaller of the two pis put into the worklist. This way each partition has all values equivalent.
Swift iterates through all the partitions and picks out the values from each partition. If a partition has more than one value, Swift picks out only the dominant value. It throws a way the other values.
CSE is very helpful in optimization. For example it removes the null checks against the parameters. In fact, CSE helps remove the same runtime check against the parameter from many places. In addition it removes the edges from the CFG.
#coding exercise
GetAlternateOddNumberRangeSumCubeRootSixthSquare (Double [] A)
{
if (A == null) return 0;
Return A.AlternateOddNumberRangeSumCubeRootSixthSquare();
}

Wednesday, March 11, 2015

Today we continue to read the WRL research report on the Swift Java Compiler. We were discussing method splitting and the criteria for the splitting point. We now look at Global Common SubExpression elimination and global code motion. Swift applies CSE to any two values that are equivalent no matter where they occur in the control flow graph. Even if neither of the two values are dominant, one value is replaced by the other. Global CSE is followed by a pass of global code motion, which places values in blocks so that all the values dominate all their users.
Two values are considered equivalent if they have identical operations and equivalent inputs. During the CSE, Swift computes equivalent values via a partitioning algorithm that splits all values into initial partitions based solely on their operation fields and puts the partitions on a work-list. The work list is the set of values that are not necessarily equivalent. Each time it takes a partition P of the work list, it builds the set of values Si which takes one of the values in the partition as the ith input. If there is a partition Q which has some set of its values Q prime but not all of its values in Si, then it is split up into Q prime and the remaining and the smaller of the two partitions is added to the wordlist. This way Swift builds up each partition to have all values equivalent
Double GetAlternateOddNumberRangeSumCubeRootFourthSquare (Double [] A)

{

if (A == null) return 0;

Return A.AlternateOddNumberRangeSumCubeRootFourthSquare();

}

Tuesday, March 10, 2015

Today we continue to read the WRL research report about the Swift Java Compiler. We were looking into Method splitting. We saw that class methods that proliferate even though they have small amount of code are good candidates for inlining. However they may include checks that can cause exceptions. In these exception cases, the method may not be inlined. Moreover, this case may use extra register saves and restores. Therefore, the method is split into two versions one for the happy code path and the other for the exception code path. Swift determines whether a method is splittable rather conservatively. Unless the method has small amount of code such as looking up or setting an encapsulated field or array member, the methods is not considered for splitting. A method is splittable only if it has a single path to the return statement that calls no other methods as well as multiple other paths that eventually throw exception. The path to the normal exit is then considered the common case. The information to split the method is computed and cached by the method analysis module. When Swift compiles a splittable method, it will generate auxiliary methods along the uncommon code paths.and replace them with the auxiliary methods. The IR of the original method is thus modified. These calls to the auxiliary method are then inlined.

Monday, March 9, 2015

Today we will continue our discussion on the WRL research report about the Swift Java Compiler. We were looking into method resolution and inlining. When Swift is invoked in simple Class Hierarchy Analysis mode, it will not use CHA to resolve methods for use by any interprocedural analysis, such as alias analysis or escape analysis. It will only use it for virtual call resolution in the methods that are being compiled and convert them to direct calls. In addition, if a call was resolved using CHA, then Swift won't do method inlining for that call. However, it will allow method inlining if the receiver of the call is an input argument of the containing method. With these restrictions, the JVM will find it easy to update code optimized using CHA when a class is dynamically loaded. For each compiled method of the classes which if they are subclassed by a dynamically loaded class will require the method to be modified or invalidated, Swift will provide a list to the JVM. Along with each class, Swift indicates which new method would require a modification of the containing method and for each of these,whether a particular direct call can be modified or if the containing method's code needs to be invalidated or recompiled.
We next look at method splitting. This technique is used when the bulk of the usage of the method follows a small happy code path and a small rare percentage runs some checks and fails it causing either an error exit or an exception. In such cases, the method is split because the usual code path can now be inlined and enable further optimizations. This is usually the case for methods of a class that usually return a field or an element of the array
#codingexercise

Double GetAlternateOddNumberRangeSumSquareRootFourthSquare (Double [] A)

{

if (A == null) return 0;

Return A.AlternateOddNumberRangeSumSquareRootFourthSquare();

}

Sunday, March 8, 2015

Today we will continue to discuss Swift Java Compiler. In the previous post we started looking at method resolution and inlining. We discussed how we could detect that most super class constructors maybe empty. If Swift is used in an environment where dynamic loading can occur, then the use of CHA (Class hierarchy analysis) is limited to simplify correctness issues. In general, if a method is compiled with information that a particular class has no subclasses or a particular method has only a single implementation, then the method's code must be modified if dynamic loading means that this is no longer applicable. However, this method may be in executing and even running a tight loop and we may not be able to wait. If the method has a virtual method call that is resolved to a direct call using CHA and it is not inlined, then we can just turn off CHA even if the method is running. Here the JVM generates a stub and the jump to the virtual call is done atomically. The more difficult problem is to reverse the effects of CHA when a call is resolved and inlined. The resolved call only becomes incorrect if the receiver can contain an object from the dynamically loaded class. If the receiver is an argument of the containing method, then it existed prior and cannot change. The receiver cannot reference an object of the dynamically loaded class, and hence for the duration of the current invocation, the containing method will be correct. The containing method can be invalidated simply by creating a new version with the virtual call reinstated. In this case, all the future calls must use the new version. The JVM can further choose to not generate a new compiled code, and instead ensure that future calls of the containing method revert to interpreting the byte code directly and using less optimized JIT code.
#codingexercise
Double GetAlternateOddNumberRangeSumSquareRootSixthSquare (Double [] A)
{
if (A == null) return 0;
Return A.AlternateOddNumberRangeSumSquareRootSixthSquare();
}

Saturday, March 7, 2015

We discuss the Swift Java compiler further. We were looking into Swift's LocType and StoreOp in the dataflow analysis. Since Swift does a full analysis, it correctly determines when a particular locType cannot be modified in a loop.The state information for each node is updated which is then used to update the dependencies. Each load operation for example may have a change of store as its input following the store resolution process. This makes the relaxed dependence information available to all later passes. In the example of load operations with changed input, they will now be combined into a single load opeartion. Other similar effects from subsequent passes could involve a load operation to be scheduled earlier or to be moved out of a loop.
Swift's analysis is interprocedural since it makes use of summaries of the effects of methods.As with other method properties.the write effects of a method are computed on demand by the method analysis module. This yields a list of fields that do not affect data flow and array element types (LocTypes) that the method or the calls made might modify.Unlike many other properties the immediate effects of a byte code can be determined without building a CFG or SSA graph.
Since the locations are categorized into distinct field and array element types, Swift has used a type based alias analysis scheme. The field name of the location is also considered part of its type. That said, the store resolution can easily be used with a more sophisticated alias analysis that can manage different kinds of LocTypes and improve accuracy.
Let us now look at method resolution and inlining. Swift resolves virtual method calls using information from various kinds of analyses. For example, if Swift is using Class hierarchy analysis, the method implementation may indicate there is only one possible implementation that could be referenced by a virtual call. If there is no CHA analysis, the final keyword can indicate if the implementation is only one. Or from the type analysis, it may indicate that the receiver may have a particular exact type and therefore can only invoke one specific method. Similarly interface calls can be resolved. Method inlining may be applied recursively as part of other interprocedural optimizations. This helps with such things as the removal of superclass constructors which may be mostly empty.
#codingexercise
Double GetAlternateOddNumberRangeSumSquareRootEigthSquare (Double [] A)
{
if (A == null) return 0;
Return A.AlternateOddNumberRangeSumSquareRootEigthSquare();
}

Friday, March 6, 2015

Today we continue reading the WRL research report on Swift Java Compiler. We were discussing alias analysis We will see a few more ways to relax memory dependencies. The basic Swift IR is retained. A process called store resolution is used to relax memory dependences by changing the global store inputs of some memory operations. In the case of the loop example mentioned earlier, rather than changing the stores of A and B to be the global store inputs, we keep them the original store inputs and move them out of the loop. Store resolution does a forward dataflow computation to build up information about memory operations, as follows.
We look at the sub-graph of the SSA graph where global stores are produced. We label such operations that produce them as StoreOps. We want to compute information at these nodes about sets of locations that are easy to classify. With the help of the language, we know particular fields are of certain types and array elements are of particular types. These different sets of location can be labeled LocTypes. At each node of the subgraph we want to find for each LocType, the most recent StoreOp that might have modified the locations of that type
From the dataflow, the state at each node maps each LocType to the most recent preceding StoreOp which might have modified locations of that LocType. The state entry has a default StoreOp that applies to most LocTypes. and miscellaneous ordered pairs of (LocType, StoreOp) for any locTypes that don't map to the default. Perhaps we can classify them to have more salient entries in the state.
#codingexercise

Double GetAlternateOddNumberRangeSumSquareRootTenthSquare (Double [] A)

{

if (A == null) return 0;

Return A.AlternateOddNumberRangeSumSquareRootTenthSquare();

}

Thursday, March 5, 2015

In the previous post, we were discussing alias analysis in the Swift Java compiler from the WRL research report. We recall that it attempts to relax data dependencies in the SSA graph. If there are two memory locations, then alias analysis checks to see if they are the same. If we take the example of sloop, that contains two array store operations, then there will be a phi node at the top of the loop that merges the initial global store entering the loop with that at the end of each iteration. If the values don't move outside the loop, they should be reloaded in each iteration.
Memory operations that access the same field or same type of array element of a particular type can only be affected by memory operations that access the same field or same type of array element. global stores cannot modify the type. To represent dependences among memory operations with more precision is to have many different types of global stores. There are several problems with this approach. First, this would create more edges and nodes in the SSA graph Extra phi nodes will be required for many different kinds of global stores. Method body would now require arg nodes for many different kinds of global stores. If the effects of a method call are unknown, then it would take every possible kind of global store as input, and produce new versions as output. We already talked about synchronization nodes having different versions when there are unnecessary synchronizations specified. We would create even more versions with different types of global stores. Method inlining also becomes complicated in that it might also require that additional arg nodes be created from the caller.

Wednesday, March 4, 2015

In the previous post, we discussed escape analysis as implemented in the Java Swift compiler and discussed in the corresponding WRL research report. Specifically we looked at references that can escape from the current method or thread. This helps us do optimizations such as allocating the objects on the stack or on the heap and the removal of unnecessary synchronization. We found that the Swift compiler uses the summary of the method together with the inter procedural analysis. It finds the usages of the references to determine if the references are being held globally or used in an array that can let it escape. With this analysis, Swift looks for values which are either objects that are directly allocated in the method or are newly allocated unaliased objects returned by method calls. It makes a conservative selection and comes up with a pruned list. For methods that don't require synchronization, it adds the corresponding operations to the SSA graph.
We review Alias analysis next. Here Swift attempts to relax data dependencies in the SSA graph. By alias, we mean that two memory locations are not the same when they are accessed. For example, if a loop contains two array store operations then there is a phi node in the Swift IR at the top of the loop and this merges the initial global store entering the loop with that at the end of each iteration. The values of the variables loaded inside of the loop cannot move outside the loop. Therefore they are reloaded in each iteration.

Tuesday, March 3, 2015

Today we will continue our discussion on WRL Research report on Swift Java compiler. We were discussing Escape Analysis. This was used to determine if a reference to an object escapes a thread or a particular method call. This has applications in determining whether the object can be allocated on the stack or on the heap. It can also be used to eliminate the cost of unnecessary synchronization.
The analysis is performed by determining whether an object is stored in a global variable or in another data structure such as a heap object. We can then analyze the data flows to determine if any value in the SSA graph escapes. We can use the summary information of the method call effectively for this purpose. Then we can perform an inter procedural data flow analysis on demand to determine if a method may store or return its arguments particularly if it returns a new object that has not been stored. We take a conservative approach otherwise. We can also extend the simple analysis to take advantage of some information available from field analysis. We are particularly interested in fields that are encapsulated within another object such that they are never leaked. If the parent object does not escape then any of its contained fields do not escape as well. and we can apply this recursively.
With this analysis, Swift looks for values which are either objects objects that are directly allocated in the method or are newly allocated unaliased objects returned by method calls. The list is then pruned for what we can conservatively say as does not escape. There are also additional restrictions for objects that will be stack allocated such as each object must be a precise type and the array lengths must be small. If the object can be stack allocated, then the corresponding operation to be allocated on the stack is added to the SSA graph and the existing allocation operation is removed. If the object was allocated by a called method, then another version of that method is generated which initializes the object on the stack. If Swift determines that a synchronization operation is unnecessary then it scans all the uses of the object before eliminating them. It may also create unsynchronized versions of the corresponding methods.

Monday, March 2, 2015

Today we continue to discuss the WRL research report on the Swift Java compiler. We were discussing Field Analysis. We now review Type Propagation. This is useful for resolving some virtual method calls, especially when Class Hierarchy Analysis is not being used. Swift assigns types to all values based on available information in the byte code and the SSA graph. Some values have very specific types which means that if a base class is instantiated, the type field is marked with this class and not any subclasses. Type Propagation ensures that this correct type field is marked in all applicable values. The way propagation works is that the types are merged at control flow joins, so the type is propagated in the manner of the flow. Therefore type propagation is considered flow sensitive. Given this, Swift can resolve a virtual method call if the receiver of the call has an exact type and it doesn't need to look any further.
Now we will discuss how the exact types are determined. Exact types can be determined in several ways. First exact types may be known at the time of variable initialization or allocation. Second, Swift can take a look at the method and determine whether it returns an object with an exact return type. Third Swift can use field analysis to determine if the load from the field of an object always returns an object with the exact type.
We now review Escape analysis. Escape analysis is used to determine if a reference to an object escapes a thread. By escape, we mean that the reference of an object can be accessed another thread or it can still be accessed by the same thread from another method call. This kind of analysis comes in useful to determine if an object can be allocated on the stack, rather than the heap. If the reference t o an object does not escape a particular method call, then the object can be allocated on the stack frame of that call. This analysis is also used to eliminate cost of unnecessary synchronization. For example, if an object does not escape a thread, then the synchronization of the object is unnecessary.

Sunday, March 1, 2015

Today we continue to discuss the WRL research report on the Swift Java compiler. We were reviewing the class and method analysis section of the paper. We saw how the Swift compiler applies class-hierarchy analysis to resolve virtual calls. Swift also maintains a variety of information about methods in a hash table. It can be used to resolve method calls. Type propagation is useful for resolving some virtual method calls, The use of SSA form makes the makes the type propagation flow sensitive.
Exact types are determined in several ways.
First, exact types are known when an object or array is directly allocated.
Second, Swift can compute on demand whether a method returns an object with an exact type.
Third, Swift can do field analyses to determine if a load from a field of an object always returns an object with an exact type.
Next we review Field Analysis. This is an expensive inter-procedural analysis. It involves reading and storing field attributes such as access modifiers so that they can be honored during the program execution.
As an example, if there is a field called points and it is marked private, the compiler only needs to scan the instance methods in class Plane to determine its properties.If points is non-null, it must reference an array with base type Point and a fixed size of three. Swift uses exact type information from field analyses to help resolve method calls.Null checks can be eliminated for fields that are known to be non-null.Swift uses page protection to implement null-checks. While Swift uses page protection to implement null-checks without any extra code, eliminating null checks is still useful because it gives the compiler more flexibility in code motion. Lastly, a property of a field is computed on-demand only if required for a potential optimization.