Cluster computing

Tuesday, September 1, 2020

Object and array inlining

Objects and array inlining are made on a field by field basis at run time. The analysis overhead has to be small and the speedup has to be big so that the optimization is worth it. A heuristic approach to determining this criterion is not sufficient because it is local to the array in nature. The example discussed earlier that uses the access count and a threshold can determine a selection of candidates. But only with a statistic that places this in context of the global total of all the fields in all loaded classes does the criterion become clearer. A count from the global puts into perspective the usefulness of the repetitions of the eliminating field loads by focusing on where they matter more rather than routinely performing each and every one of them until the ones that matter don’t get the optimization.

There are approaches different from Statistics as well which include neurons and a load embedding layer that can perform even better than statistics using neural net and classifiers since each field is isolated. However, the effort to bootstrap, load and accumulate the passes even when done in batches may amount to prohibitive usage of resources of the runtime. Dynamic grouping on the other hand can be an alternative approach to determine the criterion by evaluating groups not with classifiers but with regressors.

This leads us to a brief overview of related work. A static compiler for a variation of C++ was extended earlier to perform automatic object inlining. Its algorithm clones the code of methods that accesses optimized objects. That approach is different from the above mentioned in that all objects of a class are inlined when it meets a criterion. Also, that algorithm moves the object headers and pointers to inlined objects which reduces the object size. It can also convert an array of references to an array of object values. The approach described here is from the paper on Automatic Array Inlining in Java Virtual Machines by Wimmer et al and this approach can inline dynamic arrays

Monday, August 31, 2020

Object and array inlining

The tradeoff between the benefits of inlining or not is demonstrated by the length of the time interval between the overwrite of the fields and the garbage collection which restores the optimized field order. If the interval is too long, the benefits fade away when a revert to non-inlined field access performs better as before. If the interval is short, then the restoration of optimized field order for subsequent access saves at least one instruction to load each time.

The garbage collector is based on aging and compaction of heap allocations. The younger generation is collected frequently since they are not used on a prolonged basis. The stop-and-copy collector copies young objects between two alternating spaces and it increments a field for age in each activity. When the age exceeds a threshold, the object is promoted to the older generation. If the objects don’t linger long, they don’t make it to the next generation and this works in the favor of the inlining. This is true for all new arrays and objects but in order for it to be true for parent arrays, the parent must also be in the younger generation. These collocated objects can be restored collectively in the young generation if there were a way to keep them all there and for this purpose the factors that determine the aging and the threshold can be tweaked in favor of inlining.

The check for array bounds can be eliminated in favor of optimized access if it can be guaranteed that all accesses based on the index of the array are safe. For example, if the check is in a loop and the loop is invariant, the check can be completely omitted. Similarly, if the array field accesses are known which explains all changes to the array, then the bounds check can be eliminated say outside the loop but it is just the opposite. The bounds check is used by the program to make modifications to the array in which case an optimized access is not practical.

Since the savings in the optimized code has been referred to as the number of instructions avoided, the performance of array inlining is directly proportional to the number of field loads eliminated. Against a comparison of baseline, array inlining, object inlining and dynamic array inlining, a count of the load of references against the same distribution in each case clearly explains the differences in performance between them.

Sunday, August 30, 2020

Object and array inlining

When arrays are used in data structures that take variable number of arguments, they are often allocated with a fixed length and then resized to suit growing demands. The resizing involves allocation of a larger array, copying all elements from existing to old and then discarding the old one. This example is similar to the one used in the implementation of ArrayList.

One of the approaches to inlining array fields that can change involves the steps to collocate the child array with initial size together with its parent, the step to allow overwriting of the fields with the references to a new array although this step undoes the offset calculation using the earlier array and finally the garbage collection step that restores the optimized field order. Between the second and the third step which could be arbitrary amount of time, no optimized access is possible which calls for an additional check before an element of an inlined array is accessed.

The inlining saves one instruction at a time which is the load of the inlined field. If additional instructions are involved, then the benefit dissipates. This calls for combining additional checks with array bounds check that precedes every array access

The array length is cloned. The dynamic array inlining does not allow the optimization of direct access to array length such as with the generation of the arraylength byte code. The test for the result of the optimized access is 0 requires a compare and branch instruction which is more expensive than the normal field access. Overwriting the array length destroys the array because bounds check for all subsequent access will fail. If all the accesses were via inlined array fields, then a global data flow analysis can determine if all the accesses are possible before the dynamic array inlining is initiated.

Instead of overwriting the array length that is also accessed by array bounds check, a copy is maintained that can be overwritten only by the optimized machine code

Saturday, August 29, 2020

Object and array inlining

The bytecodes that load and store elements of reference arrays have no static operands which makes it difficult to know whether elements of modified array have been inlined or not. If a field is modified after it has been initialized, the object inlining system knows that the field is not inlineable. The putField bytecode contains the type which determines which objects are being modified. The aastore bytecode is not typed therefore the same bytecodes are emitted for methods that contain typed and non-typed array parameters which prohibits the inlining of array elements.

Only a global data flow analysis can determine all the contexts in which the methods are called and then determining whether the type in the parameter is the same as the candidate for inlining. Reflection and lazy loading might complicate this but the majority of the usages is suitable to global data flow. Special cases of arrays as inlining parents could be handled without a global analysis.

Friday, August 28, 2020

Object and array inlining

Fixed Array inlining if all objects of a parent class reference arrays with the same fixed length, then the field access for the array fields can be eliminated because their offset is known. In addition, subsequent access can also be optimized by eliminating the array bounds check. There are two additional instructions required over object field access. The first one checks the index against the length and the second one branches to an out of line code block that throws an exception. Instead of three loads - only one load is necessary with the two eliminations.

Variable array inlining: The objects of a parent class reference arrays with different but fixed lengths. The parent can have only one variable array inlining child because the inlining offset of a subsequent child cannot be known with the variable length of the first. In this case the two memory loads are necessary.

Dynamic array inlining: When an array field is assigned multiple times, one more check needs to be included for the field access because it is not safe to eliminate it. However, unlike object field, it is possible to detect if an array field has been changed at runtime.

The above section describes the types of array inlining for a single array and it is also possible to inline the parent array when one is nested within the other. The parent contains references to children arrays but when inlined the access to the child element is at an offset at start + zero-based-index x size + offset –of-field because the children are contiguous.

The array access bytecodes require a global data flow. The bytecodes are executed using an operand stack. Only constants are used with bytecodes which includes numbers of local variables, numeric constants and even field names. The bytecodes for loading and storing fields include a symbolic reference to the accessed field and the containing class This helps to determine which fields of the classes are changed at runtime.

Thursday, August 27, 2020

Object and array inlining continued

Object and array inlining work best when the hot fields are detected. The Just-in-time compiler helps with this by inserting read barriers that increment field access counters per field and class. And the garbage collector pitches in by moving objects that are linked by hot fields into groups such that the parent and its children are consecutive.

The inlining works when the following two conditions are met. First, the parent and child objects must be allocated together and the field store that places the reference of the parent in the child must happen immediately afterwards so that it remains true for the lifetime of the data structures that are collocated this way. Second, the field stores must not overwrite the field with a new value.

The leveraging of compiler and garbage collector for inlining is common to both object and array inlining bu there is also a difference. When arrays are used as inlining children, the size of arrays is not clear as a compile-time constant. The Java bytecodes for accessing array elements have no static type information. The preconditions can be made similar to that of object inlining if the following three instructions are combined into a single instruction for collocation- the object allocation of parent, the array allocation of the child, and the field store of the array field.

There are three types of inlining: fixed array inlining, variable array inlining, and dynamic array inlining.
Fixed array inlining is one where the array fields can be handled the same way as object fields because the length is constant.
Variable array inlining is one where the array fields have different but fixed lengths.
Dynamic array inlining is one where the array field is assigned multiple times.

Wednesday, August 26, 2020

Introspection store discussion addendum

The set of policies that can determine the injection of fields into the introspection stream can be maintained in a module just like the analytical application itself. The only difference is that the Flink application is executed higher up the stack in the analysis wing of the platform while the execution of policies is handled within the stream store. The stream store can host a lightweight runtime for the evaluation of rules that can be written in program order.

Another alternative is for the rules to be simpler and involve simple recognizable predicates for boolean evaluation in strict program order. In such cases, the need for an expression tree is obviated. A set of nested if conditions can be flattened into several sequential single-level if conditions after they include repeated conditions from the outer levels. This is easier to pipeline once the statements are singular and isolated from the other statements and they occur in a declaration.

The reconfiguration of the stream store to annotate or inject fields or events into the stream in the introspection store should be done at the startup or without the requiring the store to go offline visibly. If the store does need to shutdown prior to reconfiguration, the shutdown and startup can be done on a rolling basis through the nodes of the cluster hosting the store so that the client demands can be continued to be met.

Once all the nodes of the cluster have gone through a rolling restart, the new policies can then kick in. This will involve the events to be grouped or tagged differently. This works even for immutable events by way of introducing delimiter events that bound the set of immutable events implying that they are grouped together. Thus, events with delimiters can form groups and these groups can represent hierarchical levels.

There are ways to provide different delimiters so that a level could mean something different from another while these different delimiters perform the same way. The idea is that a delimiter can have any key/value or any number of them and they can be administrator specified via policies.

The use of delimiters does not stop with policies. Administrator can choose to set a delimiter without policy evaluation or the system components can use their own delimiters. The inlining of delimiters provides a way to use the same stream while the annotations in a dedicated stream that use the sequence number of the events in the original stream require their own maintenance. The dedicated stream also requires adjustments to its events when the original stream is truncated so that the beginning and end events change and the sequence numbers also change.