Cluster computing

Tuesday, February 24, 2015

We will continue our reading of this WRL research report. SSA improves the effectiveness of peephole optimizations. SSA form exposes all the users of a value and not just a limited window.The Swift compiler therefore looks for patterns and applies the peephole optimizations where the peephole is the whole method. The values in the pattern can be spread anywhere in the control flow. As an example, the length property of an array can be replaced by the size argument to the allocation operation. This optimization can be applied no matter where the allocation and the length operations are used in the method. When the array is allocated at the beginning of the method and the length operations are generated for bound checks, this optimization is quite effective.
Let us now look at what constraints the SSA form may put on the Swift compiler. First and most importantly the register allocator must be quite effective and general purpose.It should minimize copies and spills.because they can overshadow the gains from SSA.Similarly the passes that do the final placement of values in blocks and the scheduling within blocks are quite important. The SSA form gives a lot of freedom to do this final schedule so it must be made use of most.
Lastly one limitation mentioned by some groups is that the SSA form is difficult to use for restructuring optimizations, since it may be difficult to add required phi nodes correctly as code is restructured and new join points are created. However the authors did not encounter any such difficulty for all their optimizations. including many that modify the CFG. These included method inlining, method splitting, conditional constant propagation etc. Loop peeling, Loop unrolling or Loop interchange may be the only exceptions where it may be difficult.
#codingexercise

Double GetAlternateEvenNumberRangeSumSixthPowerRootSquare (Double [] A)

{
if (A == null) return 0;
Return A.AlternateEvenNumberRangeSumSixthPowerRootSquare();
}

Monday, February 23, 2015

We will continue our reading of this WRL research report. We now discuss the results of the Swift IR. Among the several optimizations facilitated by Swift IR, the rule for determining if values are equivalent and therefore can be eliminated as a common subexpression, works for all operations. Further, identical operations with equivalent inputs are equivalent because the global store inputs conservatively represent the memory dependences and all other true data dependences are explicit in the inputs to the values. Together these rules imply that the common subexpression elimination (CSE) can be applied to all values including memory reads. Values will be candidates for CSE only if they access the same field and they have the same global store input. This principle is applied even at the time of building the SSA graph.These rules are applied over and over again in many passes of the compiler.
The SSA graph makes all data flow and ordering dependences explicit. This is a tremendous advantage because optimizations now follow a single data structure. Although anti-dependences are not explicitly represented and must be generated during scheduling passes, the Swift IR enforces them via explicit checks instead. In addition, Swift IR pins various operations to their basic blocks. All other values are explicitly moved around subject to scheduling constraints. Like CSE, scheduling is still simple enough that several passes in Swift do limited form of scheduling to see if some other optimization would be useful.
Method inlining is applied by requiring dependences between values in the inlined method and the values in the calling method. It just produces a slightly larger SSA graph. All the code optimizations such as CSE, code motion, and dead-code elimination, apply after method inlining without any artificial restrictions.
One of the findings from this study is that SSA improves the effectiveness of peephole optimizations.

#coding exercise
How can we differentiate if a given email alias is that of an individual or a distribution list:

def validateByLDAP(email, password, alias = None):

     ldap.set_option(ldap.OPT_X_TLS_REQUIRE_CERT, ldap.OPT_X_TLS_NEVER)

     ld.set_option(ldap.OPT_REFERRALS, 0)

     ld.set_option(ldap.OPT_PROTOCOL_VERSION, 3)

     ld.set_option(ldap.OPT_X_TLS,ldap.OPT_X_TLS_DEMAND)

     ld.set_option(ldap.OPT_X_TLS_DEMAND, True)

     ld.set_option(ldap.OPT_DEBUG_LEVEL, 255)

     ld.simple_bind_s(email, password)

     if not alias:

         raise "Invalid email alias"

     base_dn = "cn=users,dc=abcnet,dc=global,dc=abc,dc=com"

     filter = "(samaccountname="+ name +")"

     results = ld.search_s(base_dn,ldap.SCOPE_SUBTREE, filter, ['sn'])

     print repr(results)

     return results

returns '' for an alias that does not belong to an individual otherwise

returns an array with a match found

Sunday, February 22, 2015

Today we continue our discussion on the method calls from the WRL research report on the Swift Java compiler. The Swift IR has operations to represent the different method invocations in Java such as class method calls, instance method calls, and interface calls. Class method calls are like procedural calls. Values representing method calls all take the method arguments as inputs as well as the current global store. A null check on the receiver object is included before any instance method invocation,
Most method calls produce a program result, as well as, producing a new global store. In order to foliow the rule that each value in the SSA graph produce a single result, method calls produce a tuple which consists of the method results and the new global store. The two components are individually projected as required Values representing the method calls are followed by one or two select calls in the SSA graph.
If a method can cause an exception that is caught in the current method, the call actually produces a distinct global store for each normal and exception successor edge, since the state of memory be different in either case. In the most common case, when no exception of the method call is caught, no extra store is required since the exception exit will not access the global store.
There are arguments in the entry block for the incoming global store of a method call and in the case of an instance method, the first argument is the receiver. The compiler therefore automatically uses the null-check argument when a null-check of the receiver is required so there is no machine code corresponding to this check.
The handling of the store and null check arguments above is conducive to method inlining. To do so , the compiler builds the IR of the original method and the called method The CFG and the SSA graph of the called method are inserted into the CFG and the SSA graph of the calling method at the point of the call, with the arguments of the method body of the inlined method replaced by the corresponding arguments to the method call. These arguments include the global store and the null check of the receiver object This matching process establishes the correct dependences so that all the values within the inlined SSA graph can now be moved anywhere allowed by the usual dependence rules.
If a method has no write operations, then the operations that were below this call will be using the store from above this call because the method returns the original store in the return operation. Therefore there will be optimizations given that the method has no side-effects.

.
#codingexercise
Double GetAlternateEvenNumberRangeSumFourthPowerRootSquare (Double [] A)
{
if (A == null) return 0;
Return A.AlternateEvenNumberRangeSumFourthPowerRootSquare();
}

Saturday, February 21, 2015

Today we continue our discussion on WRL research report titled Swift Java compiler.To handle stores correctly, there are a number of other modifications to perform.The return value at the end of the method takes two values, the actual return value and the latest global store. This ensures that all the stores are correctly kept alive especially during dead code elimination passes. Also a throw value takes a value to throw and a global store as input. A method call takes a global store as input and produces a new store as output.
To synchronize an object, monitor_enter and monitor_exit operations were used. Similarly reads of volatile fields can be restrained by requiring that they produce a new store as output. In fact, the only time where the original store is reused and a new store is not produced is the set of operations that store into an object or an array example put_field, put_static and arr_store.This was done mainly because there is a control dependence on the immediately preceding control operation that determines whether it is executed such as an 'if' operation. Instead of representing these control dependences , a choice was made to instead pin these operations to their original block.
When describing a full method, there may be several edges and nodes to mention. Instead if we take an example where all the values are labeled and the inputs to the values are listed as named arguments.The exception path is not shown instead a block is shaded if it has an exception path. This is a convenience for readability.

Thursday, February 19, 2015

Today we continue our discussion on WRL research report titled Swift Java compiler. We saw the use of blocks with SSA graphs. We now look at the Swift type system. The type of each value of a graph is computed as the SSA graph is built from the byte code. However, it is not always possible to recover the exact types in a Java program. Instead it is always possible to assign a consistent set of types to the values such that the effects of the method represented by SSA graph is the same as that of the original method. For some operation, the actual value types further specifies the operation. For each type T, Swift can keep track of these additional properties:
1) the value that is known to be an object of exactly class T
2) the value is an array with a particular constant size
3) the value is known to be non-null
By incorporating these types into the type-systems. we can therefore describe the properties of any value in the SSA graph by its type. Moreover, there can now be properties for different levels of recursive types, such as arrays.
We now review representing and ordering memory operations. Local variables are strongly scoped within the current method. Java has strong typing and no pointers, hence these variables cannot be altered by another method in the same thread or by another thread. The SSA form comes useful to represent these.Values in the SSA graph can be considered to be temporary variables stored in a large set of virtual registers. It is only near the end of the compilation process that the register allocator chooses which values are stored in actual registers and which are spilled to the stack.
On the other hand, reads or writes of global variables or locations allocated from the heap must be represented explicitly as memory operations, since there may be many hidden dependencies among these operations.
Regardless of local or global, the compiler may have to ensure that the original ordering among memory operations is maintained even thought they may not be connected on the SSA graph. As an example a store followed by a load may not have any scheduling dependence if the store does not produce a result that the load reads, however their ordering may still need to be preserved. In such cases, the ordering is enforced by Swift IR by having the store actually produce a result. This is called a global store and it represents the current contents of global memory. Since the store operations modify the contents, it executes a copy on write.Additionally, each load operation takes the global store as an extra input. These data dependences between memory operations allow many optimizations to immediately generalize to memory operations. The exception is that the SSA store does not include the required dependences called anti-dependences between load and store operations. The Swift compiler enforces these dependences via explicit checks during scheduling.

#codingexercise
Double GetAlternateEvenNumberRangeSumSquareRootSquare (Double [] A)
{
if (A == null) return 0;
Return A.AlternateEvenNumberRangeSumSquareRootSquare();
}

We continue to look at the improvements introduced by the Swift compiler. The explicit checks that the Swift compiler adds are for example that the load operation which takes a global store S as input is not moved down past a store operation that modifies. While the constraints are added, it is done to avoid the anti-dependences. The anti-dependences would require a new kind of edge in the SSA graph and avoiding them reduces the number of edges in the SSA graph.
If the current global store is different on two incoming edges of a block, then a merge block called the phi node is inserted into the graph to merge the two global stores into a new current store. Similarly the phi node of the global store when inserted at the top of a loop ensures that memory operations do not incorrectly move out of the loop. The addition of phi nodes to the SSA graph increases the number of nodes in the graph but they are not as drastic as the increase in the number of edges discussed earlier. Moreover, the benefit of adding a phi node is that memory operations are treated like all other values in the SSA graph.

Sunday, February 15, 2015

Today we read from the WRL research report - the SWIFT java compiler.
Swift is an optimizing java compiler for the Alpha architecture. Swift translates java bytecodes to optimized Alpha code, and uses static single assignments form for its intermediate representation. The IR allows for all standard scalar optimizations. The Swift compiler also implements method resolution and inlining, interprocedural alias analysis, elimination of Java runtime checks, object inlining, stack allocation of objects and synchronization removal. Swift is written completely in Java and installs its generated code into a high performance JVM that provides all of the necessary run-time facilities.
In this paper the use of SSA form which is a part of the SWIFT design was beneficial to many optimizations and there were hardly any that were difficult to implement in SSA. The paper also discusses properties of the intermediate representation, some optimization passes and several overall performance results. The Swift design is quite effective in allowing efficient and general implementations of numerous interesting optimizations.
Java is known for its object oriented programming and type-safe design. In addition, it has automatic memory management. Commercial Java systems do Just in Time compilation which can only do limited optimizations but there are more expensive optimizations that can improve the relative efficiency of code generated from Java. Java presents many challenges for a compiler attempting to generate efficient code because runtime checks and heap allocations introduce considerable overhead. Many routines in the standard Java library include synchronization operation but these operations are unnecessary if the associated object is accessed by a single thread. Virtual method calls are quite expensive if they cannot be translated to a direct procedural call. Moreover, they need to be resolved and there can be exceptions at runtime. The possibility of these exceptions further constrain many optimizations.
#codingexercise
Double GetAlternateEvenNumberRangeSumCubeRootSquare (Double [] A)
{
if (A == null) return 0;
Return A.AlternateEvenNumberRangeSumCubeRootSquare();
}

We continue our discussion on the SWIF Java compiler. We were discussing the challenges for a java compiler but Java also provides type safety. This simplifies much of the analysis for example local variables can only be assigned to, the field of an object cannot be modified by an assignment to a different field. etc. The Static Single Assignment graphs also makes most or all of the dependencies in a method explicit. Swift makes use of SSA and is written in Java. It installs into a high performance JVM that provides most of the necessary run time facilities.

First we look at some of the intermediate representation. A method is represented by a static single assignment graph, embedded in a control flow graph. The SSA graph consists of nodes called values that represent individual operations. Each value has several inputs from previous operations and a single output. That's why the node can be identified by this output. In addition, each node indicates the kind of operation that the value represents as well as any extra information required for the result. There are about 55 basic operations which include the usual arithmetic, bitwise logical and comparision operators etc. They also include operation that manipulate program flow, merging values, accessing the contents of arrays and objects, and allocating new objects and arrays from the heap. Some object oriented computations include instance of computations, virtual method calls and interface calls and java specific such as lcmp and fcmpl bytecodes.

The Swift IR breaks out the various runtime checks such as null checks, bound checks, cast checks etc. These operations cause a runtime exception if the checks fail. Since these checks are distinct, other operations allow the Swift compiler to apply optimizations to the checks, such as using common sub-expression elimination on two null checks of the same array. Swift also has a value named init_ck that is an explicit representation of the class initialization check that must precede some operations.

We will continue our discussion on this WRL research report. Swift will eliminate redundant initialization checks during optimization. In addition, Swift's underlying JVM replaces an initialization check by a NOP after the first time that it is encountered.The proposal of static single assignments enables more checks than would have been previously possible.

#codingexercise
Double GetAlternateEvenNumberRangeCubeRootSumCube (Double [] A)
{
if (A == null) return 0;
Return A.AlternateEvenNumberRangeCubeRootSumCube();
}

Today we continue our discussion on this WRL research report. The Swift IR has about a 100 machine dependent operations which can be mapped to Alpha instruction set. When the swift makes two passes, it first converts the machine independent instructions into machine dependent in the first pass and in the second pass does instruction scheduling, register allocation and code generation to operate directly on the SSA graph.

The SSA graph represents the use-def chains for all variables in a method, since each value explicitly specifies the values used in computing its result. When building the SSA graph, the def-use chains are formed as well and kept up to date. Thus the graph knows that the optimization can at any stage directly access the users of a particular value.

Each method is an SSA graph embedded within a control flow graph. Each value is located in a specific basic block of the CFG, although various optimizations may move values among blocks or even change the CFG. Each method's CFG has a single entry block, a single normal exit block, and a single exceptional exit. Parameter information is in the entry block and the return value is in the exit block All blocks end whenever an operation is reached that has two or more control exits. The use of extended basic blocks although common in some Java compilers is discouraged because we don't want special processing for these blocks and moreover they can become quite complex.

#codingexercise
Double GetAlternateOddNumberRangeCubeRootSumCube (Double [] A)
{
if (A == null) return 0;
Return A.AlternateOddNumberRangeCubeRootSumCube();
}

Saturday, February 14, 2015

Today we continue reading from the WRL research report. This time we review the benchmark for opening and closing a particular file. The tests were performed with flat file and nested path file. The UNIX derivatives were consistently faster than Sprite. The results also showed that in the remote case Sprite is also faster than SunOS, but slower than Ultrix. In any case, this benchmark did not explain the previously discussed anomaly. Another benchmark that was studied involved create-delete of a temporary file. Data is written to the file on create and read from the file prior to delete. Three different sizes of data were transferred, 0, 10kbytes and 100Kbytes. This benchmarks highlighted a basic difference between Sprite and UNIX derivatives In Sprite, short lived files can be created, used and deleted without any data ever being written to disk. Information only goes to the disk after it has lived at least 30 seconds. In Unix and its derivatives, the file system appears to be much more closely tied to the disk. Even with no data written in the file, the UNIX derivatives create and delete the file. This benchmark also explains the anomaly involving the poor performance of DS3100 Ultrix on the Andrew Benchmark. The time for creating an empty file is 60% greater in DS3100-Ultrix-local than in 8800-Ultrix-local, and the time for a 100K byte file in DS 3100-Ultrix-Remote is 45 times as long as for DS3100-Sprite-Remote. This relative poor performance may be attributed to slower disks possibly due to NFS writing policy, which requires new data to be written through to the disk when the file is closed. Note that the DS3100-Ultrix-Remote achieves a write bandwidth of only about 30Kbytes/sec.
Lastly SunOS 4.0 showed a surprising behavior. The time for a file with no data is 66ms but the time for a 10Kbytes is 830 ms. It seems that when the file size jumps from 8 to 9 kbytes, it jumps to 800-ms range.
This concludes our discussion on this WRL Research report.
#codingexercise
Double GetAlternateEvenNumberRangeSumSqrtCube (Double [] A)
{
if (A == null) return 0;
Return A.AlternateEvenNumberRangeSumSqrtCube();
}