Cluster computing: August 2020

Monday, August 31, 2020

Object and array inlining

The tradeoff between the benefits of inlining or not is demonstrated by the length of the time interval between the overwrite of the fields and the garbage collection which restores the optimized field order. If the interval is too long, the benefits fade away when a revert to non-inlined field access performs better as before. If the interval is short, then the restoration of optimized field order for subsequent access saves at least one instruction to load each time.

The garbage collector is based on aging and compaction of heap allocations. The younger generation is collected frequently since they are not used on a prolonged basis. The stop-and-copy collector copies young objects between two alternating spaces and it increments a field for age in each activity. When the age exceeds a threshold, the object is promoted to the older generation. If the objects don’t linger long, they don’t make it to the next generation and this works in the favor of the inlining. This is true for all new arrays and objects but in order for it to be true for parent arrays, the parent must also be in the younger generation. These collocated objects can be restored collectively in the young generation if there were a way to keep them all there and for this purpose the factors that determine the aging and the threshold can be tweaked in favor of inlining.

The check for array bounds can be eliminated in favor of optimized access if it can be guaranteed that all accesses based on the index of the array are safe. For example, if the check is in a loop and the loop is invariant, the check can be completely omitted. Similarly, if the array field accesses are known which explains all changes to the array, then the bounds check can be eliminated say outside the loop but it is just the opposite. The bounds check is used by the program to make modifications to the array in which case an optimized access is not practical.

Since the savings in the optimized code has been referred to as the number of instructions avoided, the performance of array inlining is directly proportional to the number of field loads eliminated. Against a comparison of baseline, array inlining, object inlining and dynamic array inlining, a count of the load of references against the same distribution in each case clearly explains the differences in performance between them.

Sunday, August 30, 2020

Object and array inlining

When arrays are used in data structures that take variable number of arguments, they are often allocated with a fixed length and then resized to suit growing demands. The resizing involves allocation of a larger array, copying all elements from existing to old and then discarding the old one. This example is similar to the one used in the implementation of ArrayList.

One of the approaches to inlining array fields that can change involves the steps to collocate the child array with initial size together with its parent, the step to allow overwriting of the fields with the references to a new array although this step undoes the offset calculation using the earlier array and finally the garbage collection step that restores the optimized field order. Between the second and the third step which could be arbitrary amount of time, no optimized access is possible which calls for an additional check before an element of an inlined array is accessed.

The inlining saves one instruction at a time which is the load of the inlined field. If additional instructions are involved, then the benefit dissipates. This calls for combining additional checks with array bounds check that precedes every array access

The array length is cloned. The dynamic array inlining does not allow the optimization of direct access to array length such as with the generation of the arraylength byte code. The test for the result of the optimized access is 0 requires a compare and branch instruction which is more expensive than the normal field access. Overwriting the array length destroys the array because bounds check for all subsequent access will fail. If all the accesses were via inlined array fields, then a global data flow analysis can determine if all the accesses are possible before the dynamic array inlining is initiated.

Instead of overwriting the array length that is also accessed by array bounds check, a copy is maintained that can be overwritten only by the optimized machine code

Saturday, August 29, 2020

Object and array inlining

The bytecodes that load and store elements of reference arrays have no static operands which makes it difficult to know whether elements of modified array have been inlined or not. If a field is modified after it has been initialized, the object inlining system knows that the field is not inlineable. The putField bytecode contains the type which determines which objects are being modified. The aastore bytecode is not typed therefore the same bytecodes are emitted for methods that contain typed and non-typed array parameters which prohibits the inlining of array elements.

Only a global data flow analysis can determine all the contexts in which the methods are called and then determining whether the type in the parameter is the same as the candidate for inlining. Reflection and lazy loading might complicate this but the majority of the usages is suitable to global data flow. Special cases of arrays as inlining parents could be handled without a global analysis.

Friday, August 28, 2020

Object and array inlining

Fixed Array inlining if all objects of a parent class reference arrays with the same fixed length, then the field access for the array fields can be eliminated because their offset is known. In addition, subsequent access can also be optimized by eliminating the array bounds check. There are two additional instructions required over object field access. The first one checks the index against the length and the second one branches to an out of line code block that throws an exception. Instead of three loads - only one load is necessary with the two eliminations.

Variable array inlining: The objects of a parent class reference arrays with different but fixed lengths. The parent can have only one variable array inlining child because the inlining offset of a subsequent child cannot be known with the variable length of the first. In this case the two memory loads are necessary.

Dynamic array inlining: When an array field is assigned multiple times, one more check needs to be included for the field access because it is not safe to eliminate it. However, unlike object field, it is possible to detect if an array field has been changed at runtime.

The above section describes the types of array inlining for a single array and it is also possible to inline the parent array when one is nested within the other. The parent contains references to children arrays but when inlined the access to the child element is at an offset at start + zero-based-index x size + offset –of-field because the children are contiguous.

The array access bytecodes require a global data flow. The bytecodes are executed using an operand stack. Only constants are used with bytecodes which includes numbers of local variables, numeric constants and even field names. The bytecodes for loading and storing fields include a symbolic reference to the accessed field and the containing class This helps to determine which fields of the classes are changed at runtime.

Thursday, August 27, 2020

Object and array inlining continued

Object and array inlining work best when the hot fields are detected. The Just-in-time compiler helps with this by inserting read barriers that increment field access counters per field and class. And the garbage collector pitches in by moving objects that are linked by hot fields into groups such that the parent and its children are consecutive.

The inlining works when the following two conditions are met. First, the parent and child objects must be allocated together and the field store that places the reference of the parent in the child must happen immediately afterwards so that it remains true for the lifetime of the data structures that are collocated this way. Second, the field stores must not overwrite the field with a new value.

The leveraging of compiler and garbage collector for inlining is common to both object and array inlining bu there is also a difference. When arrays are used as inlining children, the size of arrays is not clear as a compile-time constant. The Java bytecodes for accessing array elements have no static type information. The preconditions can be made similar to that of object inlining if the following three instructions are combined into a single instruction for collocation- the object allocation of parent, the array allocation of the child, and the field store of the array field.

There are three types of inlining: fixed array inlining, variable array inlining, and dynamic array inlining.
Fixed array inlining is one where the array fields can be handled the same way as object fields because the length is constant.
Variable array inlining is one where the array fields have different but fixed lengths.
Dynamic array inlining is one where the array field is assigned multiple times.

Wednesday, August 26, 2020

Introspection store discussion addendum

The set of policies that can determine the injection of fields into the introspection stream can be maintained in a module just like the analytical application itself. The only difference is that the Flink application is executed higher up the stack in the analysis wing of the platform while the execution of policies is handled within the stream store. The stream store can host a lightweight runtime for the evaluation of rules that can be written in program order.

Another alternative is for the rules to be simpler and involve simple recognizable predicates for boolean evaluation in strict program order. In such cases, the need for an expression tree is obviated. A set of nested if conditions can be flattened into several sequential single-level if conditions after they include repeated conditions from the outer levels. This is easier to pipeline once the statements are singular and isolated from the other statements and they occur in a declaration.

The reconfiguration of the stream store to annotate or inject fields or events into the stream in the introspection store should be done at the startup or without the requiring the store to go offline visibly. If the store does need to shutdown prior to reconfiguration, the shutdown and startup can be done on a rolling basis through the nodes of the cluster hosting the store so that the client demands can be continued to be met.

Once all the nodes of the cluster have gone through a rolling restart, the new policies can then kick in. This will involve the events to be grouped or tagged differently. This works even for immutable events by way of introducing delimiter events that bound the set of immutable events implying that they are grouped together. Thus, events with delimiters can form groups and these groups can represent hierarchical levels.

There are ways to provide different delimiters so that a level could mean something different from another while these different delimiters perform the same way. The idea is that a delimiter can have any key/value or any number of them and they can be administrator specified via policies.

The use of delimiters does not stop with policies. Administrator can choose to set a delimiter without policy evaluation or the system components can use their own delimiters. The inlining of delimiters provides a way to use the same stream while the annotations in a dedicated stream that use the sequence number of the events in the original stream require their own maintenance. The dedicated stream also requires adjustments to its events when the original stream is truncated so that the beginning and end events change and the sequence numbers also change.

Tuesday, August 25, 2020

Object and array inlining continued

Objects and array inlining operate on a group of objects and arrays in the heap that are in a parent-child relationship. These nest levels allow the reference of a child to be directly in the parent even if a parent has multiple children. There can be a hierarchy of such levels but the difference is only between object fields and array fields.

The bytecode is the same between array and object fields but the inlining is different between the two. The size of the object is known beforehand. The size of an array is not known until allocation. Hot fields are worth more for optimizing. The Just-in-time compiler is leveraged for this purpose which inserts read barriers that increment field access counters per field and class.

The improvement can be seen this way. Let us say Parent and Child are two nested objects. Parent has a member c that points to a child and the child has a member f. Then the access p.c.f would cause the following instructions traditionally:

eax: p

mov ebx, [eax+8]

mov ecx, [ebx+8]

ecx: p.c.f

Instead, the optimized machine code is now:

eax: p

mov ecx, [eax + 24]

ecx: p.c.f

The object and array inlining are different. When arrays are used as inlining children, the size of arrays is not clear as a compile time constant. The Java bytecodes for accessing array elements have no static type information. The preconditions can be made similar to that of object inlining if the following three instructions are combined into a single instruction for collocation- the object allocation of parent, the array allocation of the child, and the field store of the array field.

There are three types of inlining: fixed array inlining, variable array inlining, and dynamic array inlining

Fixed array inlining is one where the array fields can be handled the same way as object fields because the length is constant.

Variable array inlining is one where the array fields have different but fixed lengths.

Dynamic array inlining is one where the array field is assigned multiple times.

Monday, August 24, 2020

Enhancements to introspection events.

The introspection stream in a stream store system gathers runtime information in the form of events from all the system components that produce it. The system itself becomes a reader and writer of events. All the events are raw data from the components that may or may not have any aggregation performed over time. These events from the introspection store is helpful for the system diagnostics when call home events are raised. The convenience of logs and metrics to be searched via grep or influxdb sql can also be emulated with stream store events. A stock Flink Application that reads all the events in the introspection store can be included in the utility belt of the troubleshooter. All introspection events may be wrapped in a standard manner so that they appear the same across all publishers within the stream store. The Flink Application may also be savvy to include not just one query but any number of ready-made queries that would provide useful query results to the troubleshooter. The type and format of contents in an introspection store is already known before-hand. This would make the queries also predictable and the library of queries for introspection store can also be kept up to date as and when new publishers are added.

There are still a few enhancements that can be made.

First, the events themselves can have augmented fields that work with the data from the system. These fields can be automatically injected at the time of write such that they are controlled by policies such as time of day or day of week independent of the system. Just like ‘IPSEC’ rules can be authored based on a number of factors. This automatic injection of fields in placeholders can be done independent of the data published from the system components. An administrator then has the ability to color code certain events based on his policies which can useful for inclusion in the query later.

Second, the queries can be improved if there are field extractors that can process the events of the stream to come up with a set of key-value attributes that can be found in some or all of the events in the introspection store. Consider the injection of arbitrary key values as enhancements to the introspection events by a third party aside from the publishers of the introspection events. These arbitrary key-values can be parsed for field extractions that would provide additional leverage in introspection queries.

Both injection of fields in introspection events and the extraction of fields for introspection queries are additional enhancements that do not disturb the operations of the system and allow for customizations from administrators and policy-based monitoring software.

Further reading: https://1drv.ms/w/s!Ashlm-Nw-wnWvBx94AZEWepNG5P5?e=IWNHMb

Sunday, August 23, 2020

Object and Array Inlining continued

eax: p

mov ebx, [eax+8]

mov ecx, [ebx+8]

ecx: p.c.f

Instead, the optimized machine code is now:

eax: p

mov ecx, [eax + 24]

ecx: p.c.f