Cluster computing

Friday, March 6, 2015

Today we continue reading the WRL research report on Swift Java Compiler. We were discussing alias analysis We will see a few more ways to relax memory dependencies. The basic Swift IR is retained. A process called store resolution is used to relax memory dependences by changing the global store inputs of some memory operations. In the case of the loop example mentioned earlier, rather than changing the stores of A and B to be the global store inputs, we keep them the original store inputs and move them out of the loop. Store resolution does a forward dataflow computation to build up information about memory operations, as follows.
We look at the sub-graph of the SSA graph where global stores are produced. We label such operations that produce them as StoreOps. We want to compute information at these nodes about sets of locations that are easy to classify. With the help of the language, we know particular fields are of certain types and array elements are of particular types. These different sets of location can be labeled LocTypes. At each node of the subgraph we want to find for each LocType, the most recent StoreOp that might have modified the locations of that type
From the dataflow, the state at each node maps each LocType to the most recent preceding StoreOp which might have modified locations of that LocType. The state entry has a default StoreOp that applies to most LocTypes. and miscellaneous ordered pairs of (LocType, StoreOp) for any locTypes that don't map to the default. Perhaps we can classify them to have more salient entries in the state.
#codingexercise

Double GetAlternateOddNumberRangeSumSquareRootTenthSquare (Double [] A)

{

if (A == null) return 0;

Return A.AlternateOddNumberRangeSumSquareRootTenthSquare();

}

Thursday, March 5, 2015

In the previous post, we were discussing alias analysis in the Swift Java compiler from the WRL research report. We recall that it attempts to relax data dependencies in the SSA graph. If there are two memory locations, then alias analysis checks to see if they are the same. If we take the example of sloop, that contains two array store operations, then there will be a phi node at the top of the loop that merges the initial global store entering the loop with that at the end of each iteration. If the values don't move outside the loop, they should be reloaded in each iteration.
Memory operations that access the same field or same type of array element of a particular type can only be affected by memory operations that access the same field or same type of array element. global stores cannot modify the type. To represent dependences among memory operations with more precision is to have many different types of global stores. There are several problems with this approach. First, this would create more edges and nodes in the SSA graph Extra phi nodes will be required for many different kinds of global stores. Method body would now require arg nodes for many different kinds of global stores. If the effects of a method call are unknown, then it would take every possible kind of global store as input, and produce new versions as output. We already talked about synchronization nodes having different versions when there are unnecessary synchronizations specified. We would create even more versions with different types of global stores. Method inlining also becomes complicated in that it might also require that additional arg nodes be created from the caller.

Wednesday, March 4, 2015

In the previous post, we discussed escape analysis as implemented in the Java Swift compiler and discussed in the corresponding WRL research report. Specifically we looked at references that can escape from the current method or thread. This helps us do optimizations such as allocating the objects on the stack or on the heap and the removal of unnecessary synchronization. We found that the Swift compiler uses the summary of the method together with the inter procedural analysis. It finds the usages of the references to determine if the references are being held globally or used in an array that can let it escape. With this analysis, Swift looks for values which are either objects that are directly allocated in the method or are newly allocated unaliased objects returned by method calls. It makes a conservative selection and comes up with a pruned list. For methods that don't require synchronization, it adds the corresponding operations to the SSA graph.
We review Alias analysis next. Here Swift attempts to relax data dependencies in the SSA graph. By alias, we mean that two memory locations are not the same when they are accessed. For example, if a loop contains two array store operations then there is a phi node in the Swift IR at the top of the loop and this merges the initial global store entering the loop with that at the end of each iteration. The values of the variables loaded inside of the loop cannot move outside the loop. Therefore they are reloaded in each iteration.

Tuesday, March 3, 2015

Today we will continue our discussion on WRL Research report on Swift Java compiler. We were discussing Escape Analysis. This was used to determine if a reference to an object escapes a thread or a particular method call. This has applications in determining whether the object can be allocated on the stack or on the heap. It can also be used to eliminate the cost of unnecessary synchronization.
The analysis is performed by determining whether an object is stored in a global variable or in another data structure such as a heap object. We can then analyze the data flows to determine if any value in the SSA graph escapes. We can use the summary information of the method call effectively for this purpose. Then we can perform an inter procedural data flow analysis on demand to determine if a method may store or return its arguments particularly if it returns a new object that has not been stored. We take a conservative approach otherwise. We can also extend the simple analysis to take advantage of some information available from field analysis. We are particularly interested in fields that are encapsulated within another object such that they are never leaked. If the parent object does not escape then any of its contained fields do not escape as well. and we can apply this recursively.
With this analysis, Swift looks for values which are either objects objects that are directly allocated in the method or are newly allocated unaliased objects returned by method calls. The list is then pruned for what we can conservatively say as does not escape. There are also additional restrictions for objects that will be stack allocated such as each object must be a precise type and the array lengths must be small. If the object can be stack allocated, then the corresponding operation to be allocated on the stack is added to the SSA graph and the existing allocation operation is removed. If the object was allocated by a called method, then another version of that method is generated which initializes the object on the stack. If Swift determines that a synchronization operation is unnecessary then it scans all the uses of the object before eliminating them. It may also create unsynchronized versions of the corresponding methods.

Monday, March 2, 2015

Today we continue to discuss the WRL research report on the Swift Java compiler. We were discussing Field Analysis. We now review Type Propagation. This is useful for resolving some virtual method calls, especially when Class Hierarchy Analysis is not being used. Swift assigns types to all values based on available information in the byte code and the SSA graph. Some values have very specific types which means that if a base class is instantiated, the type field is marked with this class and not any subclasses. Type Propagation ensures that this correct type field is marked in all applicable values. The way propagation works is that the types are merged at control flow joins, so the type is propagated in the manner of the flow. Therefore type propagation is considered flow sensitive. Given this, Swift can resolve a virtual method call if the receiver of the call has an exact type and it doesn't need to look any further.
Now we will discuss how the exact types are determined. Exact types can be determined in several ways. First exact types may be known at the time of variable initialization or allocation. Second, Swift can take a look at the method and determine whether it returns an object with an exact return type. Third Swift can use field analysis to determine if the load from the field of an object always returns an object with the exact type.
We now review Escape analysis. Escape analysis is used to determine if a reference to an object escapes a thread. By escape, we mean that the reference of an object can be accessed another thread or it can still be accessed by the same thread from another method call. This kind of analysis comes in useful to determine if an object can be allocated on the stack, rather than the heap. If the reference t o an object does not escape a particular method call, then the object can be allocated on the stack frame of that call. This analysis is also used to eliminate cost of unnecessary synchronization. For example, if an object does not escape a thread, then the synchronization of the object is unnecessary.

Sunday, March 1, 2015

Today we continue to discuss the WRL research report on the Swift Java compiler. We were reviewing the class and method analysis section of the paper. We saw how the Swift compiler applies class-hierarchy analysis to resolve virtual calls. Swift also maintains a variety of information about methods in a hash table. It can be used to resolve method calls. Type propagation is useful for resolving some virtual method calls, The use of SSA form makes the makes the type propagation flow sensitive.
Exact types are determined in several ways.
First, exact types are known when an object or array is directly allocated.
Second, Swift can compute on demand whether a method returns an object with an exact type.
Third, Swift can do field analyses to determine if a load from a field of an object always returns an object with an exact type.
Next we review Field Analysis. This is an expensive inter-procedural analysis. It involves reading and storing field attributes such as access modifiers so that they can be honored during the program execution.
As an example, if there is a field called points and it is marked private, the compiler only needs to scan the instance methods in class Plane to determine its properties.If points is non-null, it must reference an array with base type Point and a fixed size of three. Swift uses exact type information from field analyses to help resolve method calls.Null checks can be eliminated for fields that are known to be non-null.Swift uses page protection to implement null-checks. While Swift uses page protection to implement null-checks without any extra code, eliminating null checks is still useful because it gives the compiler more flexibility in code motion. Lastly, a property of a field is computed on-demand only if required for a potential optimization.

Saturday, February 28, 2015

1 / 1

We review some of the tools and technologies in the stack used by DevOps Engineers. These include the following:

1) Linux - This operating system is favored both because of its composability and flavors as well as the support from vendors. Today Ubuntu, Debian and Linux are used in many opensource development. Virtual Machines ship with this operating system as base for almost all end users.
2) Apache – This is a web server that has made it easy to serve applications. For a long time, LAMP (or Linux Apache MySql Python ) was considered the base for most web development.
3) MySQL- is a database server. Although its been a staple, its popularity among dev ops engineer grew on integration with python.
4) ZooKeeper - This is a software from Apache meant to aid the configuration, synchronization and naming of large distributed systems.
5) HDFS - is a distributed file system meant to run on thousand of nodes using commodity servers. It tolerates hardware failures and is oriented for batch processing. It provides high throughtput and streaming data access.
6) HBase - is the NoSQL database typically used when hashing is more important than indexing.It's fault tolerant, flexible and supports near realtime lookups.
7) Storm is yet another Apache free and open source software. It is used for distributed realtime computations because it processes streams of data that can be unbounded.
8) Kafka This is also an Apache software and is used for publish-subscribe messaging. It can handle hundreds of megabytes of read and writes per second
9) Spark - is used for large scale data processing and is another software from Apache. It can combine SQL, streaming and complex analytics.
10) Oozie - is a workflow scheduler for Hadoop. It can manage Apache Hadoop jobs. It is essentially a Directed Acyclic Graph of jobs.
11) Cassandra is a database server and often used when deploying to datacenters. It also supports column indexes.
12) OpenTSDB is a scalable time series database that's very helpful to record events as they flow into the system and for searching later. It runs on Hadoop and HBase.
13) NginX is another web server and helpful for high traffic websites. It comes with HTTP server, HTTP proxy and a load balancer.
14) Etcd is similar to zookeeper in that it is used for configuring and service discovery. Its considered to have high availability.
15) Puppet is also a configuration management utility but is considered more popular for interoperability between Windows and Unix flavors.
16) Mesos enables abstraction of CPU, memory and storage and runs on different machines which can be used for resource management and scheduling with its APIs.
17) Marathon - is a cluster wide initialization and control system for services that are deployed in containers such as Docker.
18) Docker - provides container technologies so that applications can be build, shipped and run. Dockerized applications are considered to be portable.