Cluster computing

Friday, June 7, 2013

The fast marching method solves boundary value problems that describe how a closed curve evolves as a function of time t with a speed f(x) in the normal direction at a point x on the curve. The speed is given but the time it takes for the contour to pass the point x is calculated by the equation. Different points may have different speeds. The region grows outward from the seed of the area. The more the points considered, the better the contour. The fast marching method makes use of a stationary surface that gives the arrival information of the moving boundary. From the initial curve, build a little bit of the surface upwards, with each iteration always starting from the initial position. It's called fast because it uses a fast heap sort algorithm that can tell the proper data point to update and hence to build the surface. For each of the N points at any given height h, the next point is determined and always in the order from the priority queue. The surface grows with height h in a monotonically increasing order because previously evaluated grid points are not revisited. This is because of non-negative speed and because the heap guarantees that the grid point with the minimum distance is selected first out of the data points in the sweep of N points at any level. Hence the entire operation of determining the next contour takes NlogN time.
This is a numerical method because it tries to view the curve as a set of discrete markers. The fast marching method has applications in image processing such as for region growing.
(Courtesy: Sethian @ Berkeley)
In the fast march method, let us say there are N points on the zero-level of the surface we track in a N*N*N unit distance grid. The curve is pushed outwards in unit-distances. Then, each step involves the following:
1) The point with the smallest distance is removed from the priority queue and its value is frozen so that we don't have to revisit it again and our direction is strictly outward.
2) Points are added to the priority queue to maintain unit-thickness
3) The distance of the neighbors of the removed point are recomputed. The finite difference is a multivariable quadratic equation.
Each point in the N*N*N is removed once from the priority queue.
(courtesy: Sean Mauch and David Breen from Caltech)
There are several data structures used by such level set methods. For example, there are :
1) narrow band : This data structure restricts contour propagation calculations to a thin band of data points adjacent to the boundary instead of the three dimensional surface. The data points in this narrow band are rebuild again for each step of the propagation as mentioned but not detailed above.
2) sparse field : This sparse field level set method uses a set of linked lists instead of the priority queue in the narrow band. The linked lists are populated with the same active data points around the boundary as above.
3) sparse block : This divides the volume into small cubic blocks of a set of active data points each. Then another coarse grid stores the pointers only to those blocks that meet the zero-level narrow band
4) Octree : The octree level set method uses a tree of nested cubes and the leaf nodes of this tree contain the distance value. 5) The Run length encoding compresses the region away from the narrow band with the sign but the compression is not applied to the narrow band. An additional acceleration lookup table based on the number of runs per cross section is also used.
(courtesy: Wikipedia)

I came across an interesting problem today and I will mention it here verbatim and try to solve it. The question was how do you delete a node from a singly linked list if the head is not given. So we only have the current node and the node has to be freed. Futhermore, the singly linked list is not a circular list that we can traverse to find the head. Since we don't know the head and we can't traverse the list, we don't know the previous node to update its reference.
By deleting the node and not the knowing the previous node, we have broken the list and created dangling references. So one solution I suggested was that we treat the list as immutable physically and overlay a layer that shows the logical list by skipping over the nodes that are deleted. So we pass through the nodes that are deleted during traversal without doing any operation on them. We can keep additional state per node in a wrapper that says if it is deleted or not. Another approach is to scan the memory for all occurance of the current pointer and set it to null now that the current node is freed though this could break users of the structures that may not be expecting a null pointer where it was previously expecting a valid pointer. That may depend on how the list is used.

I want to discuss a technical problem I came across today. I will change the problem domain and entities so that I can present the salient points. Let us consider there is a hypothetical web service with multiple data providers. These data providers are not isolated. They may provide data from their own source and/or they might get/update the data in other providers. You need the existing data providers and more can be added at any time. Adding or deleting more data providers is seamless and does not hamper the web service operations. Data providers can fail but these don't affect the overall operation since there is redundancy in the data owned by any node and there are at least three copies. The web service, however, can go online or offline and presents a single point of failure. We would like to discuss the replacement strategy for the web service and particularly the test plan around it. For example, we would like to know whether there was any regression in any of the queries from the customer to the web service. How do we go about the testing ?
The data comes from different sources and the web service maintains state in its own database. Typically the web server and the database server are provisioned on separate virtual machines. This is so because the web server requires more cpu and the database server requires more memory and storage. However, in this case let us assume that we treat the web server and database together and that they are hosted on the same virtual machine. It is this virtual machine that we want to replace.
Initially we could treat the whole system as a black box and test the system with different workloads on the web server. Most of these can be capture and replay workloads from the previous web server. However, that is not sufficient in itself because the workloads may not detect all regression. So we look at the changes and scope them to the layers and components and then we design specific tests around them. For example, these tests could be breadth and depth oriented on the data that they operate on. They could also cover edge cases and if the queries use strings, then try lower case, upper case, different unicode characters and long strings. We may also need performance and security tests. The Service level agreement such as the SLA could include metrics for performance and availability. Since the system has multiple points of failure, availability could be interpreted as a cumulative of the quorum of available resources. This could be a weighted mean since the web server is also involved. However, addition and deletion of data providers does not affect availability unless it falls short of the minimum needed.
Changes to the system could also span layers in which case we may have to test as isolated as well as end to end. For example, we can test one layer by checking against the data from the lower layers. In addition, we can test end to end to include different data providers.

Thursday, June 6, 2013

The fast marching method solves boundary value problems that describe how a closed curve evolves as a function of time t with a speed f(x) in the normal direction at a point x on the curve. The speed is given but the time it takes for the contour to pass the point x is calculated by the equation. Different points may have different speeds. The region grows outward from the seed of the area. The more the points considered, the better the contour. The fast marching method makes use of a stationary surface that gives the arrival information of the moving boundary. From the initial curve, build a little bit of the surface upwards, with each iteration always starting from the initial position. It's called fast because it uses a fast heap sort algorithm that can tell the proper data point to update and hence to build the surface. For each of the N points at any given height h, the next point is determined and always in the order from the priority queue. The surface grows with height h in a monotonically increasing order because previously evaluated grid points are not revisited. This is because of non-negative speed and because the heap guarantees that the grid point with the minimum distance is selected first out of the data points in the sweep of N points at any level. Hence the entire operation of determining the next contour takes NlogN time.
This is a numerical method because it tries to view the curve as a set of discrete markers. The fast marching method has applications in image processing such as for region growing.
(Courtesy: Sethian @ Berkeley)
We will revisit the fast marching method for its implementation and discuss clustering but for now let us look at the other partial differential methods in that category.
The partial differential equation (PDE) based methods solve PDE equations by a numerical scheme. When we use PDE for curve propagation, we use a cost function and use a minimization technique to find the next data point to consider.
There are three different PDE methods. The first one that we discussed earlier was the fast marching method. In this we consider the distance function to be always positive so that the curve expands outwards from the seed. However, the distance function can be both positive and negative and so a variation of the method that takes into consideration the sign of the distance function has been proposed and that is called the generalized fast marching method.
The other two methods are the Parametric methods and the level set methods. The parametric methods are based on parameterizing the contour based on some sampling strategy and then evolve each element according to the surrounding data points and the internal terms. When the parameters involve the sum of internal and external energy associated to a current contour and tries to minimize it, we delineate an object outline and these outlines are called snakes. The snake calculation is fast and easy and therefore valuable in video processing as well which consists of tracking the snake on the series of images in a video.
The level set method uses the contour as obtained from a height h on the stationary surface that gives arrival information. This is called the zero-level which gives the contour. As opposed to fast marching methods, the level set method can have distances that are positive or negative. The level set method does not use parameters but generalizes the approach by going to a higher dimension (third) which is the geometric properties of the evolving structure.
The PDE methods have to consider topology changes such as curve splitting or merging.
If the data points cross over each other, or if the shape breaks into two, then the curve propagation should still be able to continue.

Intermediate code:
The grammar of an intermediate language (IL) treats program as a set of instructions. The instructions are
1) a label or marker for jump statements
2) an assignment to a variable
3) a unary operator applied to an atomic expression with the result stored in a variable.
4) a binary operator applied to a variable and an atomic expression with the result stored in a variable
5) a transfer from a memory to a variable
6) a transfer from a variable to memory
7) a jump to a label
8) a conditional selection between two labels
9) a function call ( note the procedure calls are treated the same as function calls and argument and result are variables )
The compiler tries to do as much work during compile time as possible as compared to doing the same work at runtime. In general, this helps us prepare for the execution.
Now getting back to the conversion to the intermediary language, let us take a look at how to translate expressions, here the main challenge is that the expression is structured as a tree while the intermediary language is flat and the result of every operation is stored in a variable and every non constant argument is stored in another variable. A translation function is used to describe the overall technique. This function returns the code as a synthesized attribute. It translates variables and functions to the names that correspond to the intermediary language. The symbol tables are passed as inherited attributes to the translation function. In addition, attributes are used to decide where to store the values evaluated from sub-expressions. There are two ways to do this:
1) the location of the values of a sub-expression can be passed up as a synthesized attribute to a parent expression which can decide where to place its own value
2) the parent expression can decide where it wants to find the values of its sub-expressions and pass this information down as inherited attributes.
Method 1 doesn't have to generate code as it can return the name of the variable that holds the value. However, this only works under the assumption that the variable isn't updated before the value is used by the parent expression. If the expression has function calls, that may also have side-effects. Method 2 doesn't have this problem since the value is created immediately before the assignment is executed. In addition, this method can also generate code to calculate the expression result directly into the desired variable and not use temporary variables.
An inherited attribute say 'place' is the IL variable used to store the result of the expression.
The translation function uses a switch case construct for the type of expression and evaluates this way:
1) if the expression is a constant, the value of that is stored in place
2) if the expression is a variable, the equivalent IL variable is found in vtable and then copied to the place
3) if it's a unary operation, a new IL variable is generated to hold the value of the argument of the operation, then it is translated using the newly generated variable for place. Then an IL operation assigns the result to the inherited place. The operator++ concatenates the two lists of instructions.
4) if its a binary operation, two new IL variables are generated to hold the values of the arguments, then the values are translated and finally a binary operation in the intermediate language assigns the final result to the inherited place.
5) if its a function call, the arguments are translated first, then a function call is generated with the result assigned to the inherited place. The name of the function is looked up in the ftable for the corresponding name in the IL. (text from Mogensen)

In the gcc compiler, you can specify a -march parameter that indicates the instruction set the compiler can use and on the other hand you can specify -mtune=cpu-type, to indicate the processor for which the code is optimized.

intermediate code

Compilers generate intermediate code before translating to machine code. This breaks down the compiler into smaller jobs and has the following advantages:
1) Since machine code varies from one machine-architecture to other, only one translation is needed.
2) programs written in several high level languages can share the same translation from intermediate code to machine code
3) intermediate language need not be converted to machine code but interpreted by an external program. If the interpreter for the intermediate language is written in a language for which there are existing compilers for target machines, then there are even more advantages:
1) no actual backends i.e. translation to machine code needs to be written for new machines
2) a compiled program can redistributed for all machines in a single intermediate form
3) the intermediate form may be more compact than machine code.
The intermediate language should make it easy to translate from the high level language to the intermediate language and from the intermediate language to machine code. Furthermore, the intermediate language should enable optimizations.
Intermediate languages are also assessed based on their "graininess" i.e. whether an operation corresponds to a large amount of work or a small amount of work. The intermediate language operations can be coarse grained for easy interpretation and translated into a sequence of machine code instructions. On the other hand, the operations can be fine grained such that several intermediate code instructions can be combined into one machine code instruction.
(text from Mogensen)

Wednesday, June 5, 2013

type checking a program

The previous post talked about treating the program as a list of functions. The program is type correct when all the functions are type correct. The functions are type checked using a symbol table where all the functions are bound to their type. This is done with two passes over the list of functions. One to build the symbol table and the other to check the function definitions using the table. With an additional helper function to get the pair of function's declared name and type, which consists of the types of the arguments and the types of the result, we can build the symbol table and then check the function definitions.
Assignments, data structures and overloading can also be type-checked. When a variable is given a value by an assignment, its type can be checked. Whether the variable had a previously assigned value or if the assigned value is never used are additional checks that can be performed but they are not based on the structural information that type checking depends on.
To check the data structures such as structs or a value that maybe of different types, the type checker uses a data structure similar to the one used by the abstract syntax trees of declarations. When the structured data is modified by operations that have well defined types for arguments and result, then they can be type checked just like functions.
When methods are overloaded, the same name is used for several different operations. They can all be tried one after the other and the expected type and arguments can be used to pick the matching one.
When a function is defined as a polymorphic type, the type checker can insert the actual types at every use of the generic / polymorphic type. This is different from type checking overloaded methods in that the function can be instantiated by any type and not just limited by a limited list of declared alternatives.
If the program uses implicit types, a type inference algorithm gathers information about the uses of functions and variables with which it infers the types.
Thus we have seen the type checking for expressions, function declarations and programs. Type-checking enforcing static type correctness and adds to the checks for the correctness of a program. This kind of checking is often not context free and hence it is done conceptually after lexing and parsing though it could be interleaved with the syntax analysis.
Also, translator may exploit or depend on type information in which case type checking may be combined with the actual translation.
Some other constructs of a language also adds to type safety. For example, virtual functions, templates, static_casts all improve type checking.
(text from Mogensen)