Friday, February 27, 2015

We continue our discussion on the WRL research report on the Swift Java compiler.  We discussed building the IR. We now discuss the inter procedural analysis and Optimizations. The Swift compiler has numerous such optimizations which can be individually turned on or off. Customary analysis including class hierarchy analysis, type-propagation, type and field based analysis, escape analysis are applied and in addition, a new form of analysis called the field analysis is also used.  Recall that when building the SSA graph, we already came across common sub expression elimination. Values that load the field of an object will be candidates for CSE only if they access the same field of the same object and they have the same global store input. This simple rule was applied to reduce the number of nodes created. Besides the analysis, some optimizations include method resolution, method inlining, elimination of runtime checks, removal of unnecessary data dependence edges, stack allocation, and synchronization removal.
Several inter procedural optimizations are combined into a single pass and they are applied repeatedly because one optimization can enable further optimization For example, by inlining a method we may be able to tell the type of receiver of another method call, which in turn may enable that method to be inlined as well.
We first review class and method analysis. Swift contains modules that compute and cache useful pieces of information about classes and methods. The basic information about classes is the class hierarch.  It is built only as necessary to help resolve method calls. However, the Swift compiler can optionally use the Class Hierarchy analysis in resolving method calls. The compiler assumes that it is aware of the entire set of classes and that when it encounters a virtual method call, it can resolve to one of the classes or subclass implementation. If the CHA is turned on, all the known classes are loaded and a representation for the hierarchy is built. Then the compiler can scan all the subclasses of any class. If CHA was not being used, the compiler can only tell the subclasses from the classes with the usage of the keyword final.
#codingexercise
Double GetAlternateEvenNumberRangeSumTenthPowerRootSquare (Double [] A)
{
if (A == null) return 0;
Return A.AlternateEvenNumberRangeSumTenthPowerRootSquare();
}

Thursday, February 26, 2015

We continue our discussion on the WRL research report on the Swift Java compiler.  We were listing the various analysis and optimizations by the Swift compiler. We now go through each of these one by one. Let us first take a look at building the IR.
Like other compilers, Swift generates the IR from a method's byte code. First the byte code is scanned to determine the number of basic blocks and the edges between the basic blocks.  Then the variables that require phi nodes in the basic blocks are determined. This logic is called the phi node placement algorithm. The byte codes of each of the basic blocks are executed via an abstract interpretation. This allows us to maintain an association between the JVM local variables and the SSA values. It also allows us to determine the appropriate inputs for the new values and build the SSA graph.
During the creation of the SSA graph itself, various simple optimizations are performed. These include It replaces the array length operation with the allocated size of the array. It can eliminate bounds check if the index and length are constant in the current method. Profile information generated from earlier runs can also be used to annotate the edges of the CFG to indicate the execution frequency.  This is helpful for code layout and choice of traces by the trace scheduler. Without profile information, Swift has to guess these values from detecting loops.
Swift checks and maintains the invariant that there are no critical edges in the CFG graph. A critical edge is one with multiple successors and whose destination is a block with multiple predecessors. If these are not eliminated, they hinder the register allocations.  Moreover, it allows global code motion to the greatest freedom in placing values so as to minimize unnecessary computation in the final code.  An empty block is placed between the source and the destination to remove a critical edge.
#codingexercise
Double GetAlternateEvenNumberRangeSumEighthPowerRootSquare (Double [] A)
{
if (A == null) return 0;
Return A.AlternateEvenNumberRangeSumEighthPowerRootSquare();
}

Wednesday, February 25, 2015

We will continue our reading of the WRL Research report on Swift Java compiler. We now look at the organization of Swift and its optimizations. Specifically we look at the sequence of passes that convert the Java bytecode to Alpha machine code.The IR for a method is built based on the bytecode and annotated with the available profile information. A variety of interprocedural optimizations such as method inlining are applied. This is then followed by machine independent optimizations  A tree matching algorithm is then used to convert many operations to a lower and/or machine depenent form and to do peephole optimizations. Then further optimizations are applied to the machine dependent form.  These mostly include global CSE and code motion. Finally alpha code is generated by a sequence of passes that do instruction scheduling, register allocation, and code generation. All of these passes are applicable to the representation of the method in Swift IR.To summarize, the method bytecode is translated to IR and annotated with profile information, then with the information available from other methods and classes, interprocedural analysis and optimizations are carried out, then the machine independent and machine dependent optimizations follow  in that order. These include global CSE and code motion and finally instruction scheduling, register allocation and code generation in terms of alpha code is done. As we discuss each of these passes in details below, perhaps a list of all the optimizations in each stage could be enumerated so we know which pass and at what stage are we discussing the concerned optimization benefits. Interprocedural analyses involve alias analyses, class hierarchy analyses, escape analysis, field analysis, type propagation etc. Inter procedural optimizations  include method resolution, method inlining, method splitting, object inlining, stack allocation, synchronization removal. Intraprocedural optimizations involve bounds check removal, branch removal, conditional constant propagation, dead code elimination, global CSE, global code motion, loop peeling, null check removal, peephole optimizations,strength reduction, type test elimination. Machine dependent passes include conversion to machine dependent operations, peephole optimizations, sign-extension elimination. trace scheduling, register allocation, - live range splitting, biased graph coloring with rematerialization, block layout and code generation.

Tuesday, February 24, 2015

We will continue our reading of this WRL research report. SSA improves the effectiveness of peephole optimizations. SSA form exposes all the users of a value and not just a limited window.The Swift compiler therefore looks for patterns and applies the peephole optimizations where the peephole is the whole method. The values in the pattern can be spread anywhere in the control flow. As an example, the length property of an array can be replaced by the size argument to the allocation operation. This optimization can be applied no matter where the allocation and the length operations are used in the method. When the array is allocated at the beginning of the method and the length operations are generated for bound checks, this optimization is quite effective.
Let us now look at what constraints the SSA form may put on the Swift compiler. First and most importantly the register allocator must be quite effective and general purpose.It should minimize copies and spills.because they can overshadow the gains from SSA.Similarly the passes that do the final placement of values in blocks and the scheduling within blocks are quite important. The SSA form gives a lot  of freedom to do this final schedule so it must be made use of most.
Lastly one limitation mentioned by some groups is that the SSA form is difficult to use for restructuring optimizations, since it may be difficult to add required phi nodes correctly as code is restructured and new join points are created. However the authors did not encounter any such difficulty for all their optimizations. including many that modify the CFG. These included method inlining, method splitting, conditional constant propagation etc. Loop peeling, Loop unrolling or Loop interchange may be the only exceptions where it may be difficult.
#codingexercise

Double GetAlternateEvenNumberRangeSumSixthPowerRootSquare (Double [] A)

{
if (A == null) return 0;
Return A.AlternateEvenNumberRangeSumSixthPowerRootSquare();
}

Monday, February 23, 2015

We will continue our reading of this WRL research report. We now discuss the results of the Swift IR. Among the several optimizations facilitated by Swift IR, the rule for determining if values are equivalent and therefore can be eliminated as a common subexpression, works for all operations. Further, identical operations with equivalent inputs are equivalent because the global store inputs conservatively represent the memory dependences and all other true data dependences are explicit in the inputs to the values. Together these rules imply that the common subexpression elimination (CSE) can be applied to all values including memory reads. Values will be candidates for CSE only if they access the same field and they have the same global store input. This principle is applied even at the time of building the SSA graph.These rules are applied over and over again in many passes of the compiler.
The SSA graph makes all data flow and ordering dependences explicit. This is a tremendous advantage because optimizations now follow a single data structure. Although anti-dependences are not explicitly represented and must be generated during scheduling passes, the Swift IR enforces them via explicit checks instead.  In addition, Swift IR pins various operations to their basic blocks. All other values are explicitly moved around subject to scheduling constraints. Like CSE, scheduling is still simple enough that several passes in Swift do limited form of scheduling to see if some other optimization would be useful.
Method inlining is applied by requiring dependences between values in the inlined method and the values in the calling method. It just produces a slightly larger SSA graph. All the code optimizations such as CSE, code motion, and dead-code elimination, apply after method inlining without any artificial restrictions.
One of the findings from this study is that SSA improves the effectiveness of peephole optimizations.

#coding exercise
How can we differentiate if a given email alias is that of an individual or a distribution list:
def validateByLDAP(email, password, alias = None):
     ldap.set_option(ldap.OPT_X_TLS_REQUIRE_CERT, ldap.OPT_X_TLS_NEVER)
     ld.set_option(ldap.OPT_REFERRALS, 0)
     ld.set_option(ldap.OPT_PROTOCOL_VERSION, 3)
     ld.set_option(ldap.OPT_X_TLS,ldap.OPT_X_TLS_DEMAND)
     ld.set_option(ldap.OPT_X_TLS_DEMAND, True)
     ld.set_option(ldap.OPT_DEBUG_LEVEL, 255)
     ld.simple_bind_s(email, password)
     if not alias:
         raise "Invalid email alias"
     base_dn = "cn=users,dc=abcnet,dc=global,dc=abc,dc=com"
     filter = "(samaccountname="+ name +")"
     results = ld.search_s(base_dn,ldap.SCOPE_SUBTREE, filter, ['sn'])
     print repr(results)
     return results

returns '' for an alias that does not belong to an individual otherwise
returns an array with a match found 

Sunday, February 22, 2015

Today we continue our discussion on the method calls from the WRL research report on the Swift Java compiler. The Swift IR has operations to represent the different method invocations in Java such as class method calls, instance method calls, and interface calls. Class method calls are like procedural calls. Values representing method calls  all take the method arguments as inputs as well as the current global store. A null check on the receiver object is included before any instance method invocation,
Most method calls produce a  program result, as well as, producing a new global store. In order to foliow the rule that  each value in the SSA graph produce a single result, method calls produce a tuple which consists of the method results and the new global store. The two components are individually projected as required Values representing the method calls are followed by one or two select calls in the SSA graph.
If a method can cause an exception that is caught in the current method, the call actually produces a distinct global store for each normal and exception successor edge, since the state of memory be different in either case. In the most common case, when no exception of the method call is caught, no extra store is required since the exception exit will not access the global store.
There are arguments in the entry block for the incoming global store of a method call and in the case of an instance method, the first argument is the receiver. The compiler therefore automatically uses the null-check argument when a null-check of the receiver is required so there is no machine code corresponding to this check.
The handling of the store and null check arguments above is conducive to method inlining. To do so , the compiler builds the IR of the original method and the called method The CFG and the SSA graph of the called method are inserted into the CFG and the SSA graph of the calling method at the point of the call, with the arguments of the method body of the inlined method replaced by the corresponding arguments to the method call. These arguments include the global store and the null check of the receiver object This matching process establishes the correct dependences so that all the values within the inlined SSA graph can now be moved anywhere allowed by the usual dependence rules.
If a method has no write operations, then the operations that were below this call will be using the store from above this call because the method returns the original store in the return operation. Therefore there will be optimizations given that the method has no side-effects.

.
#codingexercise
Double GetAlternateEvenNumberRangeSumFourthPowerRootSquare (Double [] A)
{
if (A == null) return 0;
Return A.AlternateEvenNumberRangeSumFourthPowerRootSquare();
}

Saturday, February 21, 2015

Today we continue our discussion on WRL research report titled Swift Java compiler.To handle stores correctly, there are a number of other modifications to perform.The return value at the end of the method takes two values, the actual return value and the latest global store. This ensures that all the stores are correctly kept alive especially during dead code elimination passes. Also a throw value takes a value to throw and a global store as input. A method call takes a global store as input and produces a new store as output.
To synchronize an object, monitor_enter and monitor_exit operations were used. Similarly reads of volatile fields can be restrained by requiring that they produce a new store as output. In fact, the only time where the original store is reused and a new store is not produced is the set of operations that store into an object or an array example put_field, put_static and arr_store.This was done mainly because there is a control dependence on the immediately preceding control operation that determines whether it is executed such as an 'if' operation. Instead of representing these control dependences , a choice was made to instead pin these operations to their original block.
When describing a full method, there may be several edges and nodes to mention. Instead if we take an example where all the values are labeled and the inputs to the values are listed as named arguments.The exception path is not shown instead a block is shaded if it has an exception path. This is a convenience for readability.