Cluster computing

Saturday, March 14, 2015

Today we continue to read the WRL research report on the Swift Java Compiler. We saw that the Global Common SubExpression elimination and global code motion have significant optimization benefits. We now look at Global Code Motion. The main purpose of Global Code Motion is to move loop independent values outside of loops and to ensure that all values are in blocks that dominate their users after global CSE. One such algorithm is the Click's code motion algorithm that proceeds via two passes over the SSA graph. The first pass finds the earliest possible placement of all values by requiring only that a value be placed in a position that dominates all its users. This requirement implies that the early placement will dominate the late value. After determining the latest placement for value in the second pass, the algorithm scans the sequence of blocks from the early to the late placement value. The values are then placed in the latest block. They are ordered such that the innermost loop is the lowest in the denominator tree. The result of this motion is that the value is moved out of loops as much as possible, and is otherwise put in the most control dependent block. Just before the code motion, duplicate inputs to phi nodes are eliminated. The SSA graph for a method has a phi node in the final block that merges all the possible return values. To get rid of the duplicate code sequences, the phi nodes are scanned for duplicate input. If one such is found, then the only other phi node in the block is for global stores. The duplicates are then eliminated by inserting an extra predecessor block that joins all the blocks corresponding to duplicate inputs. This results in the register copies for the duplicate inputs to be merged into a single code sequence and the duplicated value can now move down into the new block.
As an example, if we have a method that takes a single parameter,, does a few parameter checks initially and loops over the iterator of the parameter to check for a condition and returns true/false,
then per the discussion above, there is a phi node that merges all the return values. The generated code has sequences that load false as well as true and jump to the epilog. These may be redundant for their respective return value and they are eliminated by inserting the extra predecessor block.

Cluster computing

Saturday, March 14, 2015

No comments:

Post a Comment