Cluster computing

Sunday, April 26, 2020

The reconfiguration of longevity tool
The longevity tool requires task configuration which is usually provided at the start of the execution with the help of a configmap that mounts the configuration on a read-only fileshare. The tool runs for a long time and in the process may encounter intermittent exceptions. Some exceptions can be ignored without requiring the tool to stop and resetting the counters gathered from the run.
This called for addressing the tools limitations on several fronts. First the retries were added to readers so that they can get past some of the exceptions that otherwise brought the tool down. The readers were chosen because they were independent and launched with a reader group config. The writers also encounter exceptions but this could be prioritized after the retries for the readers since the events were random data and the readers had to perform validations that the writer did not.
Among the validations, byte-level validations was important because the validations required enabled the data written to be read the same. The readers showed zero malformed events exceptions in regular runs and only a rare number when connections were abruptly closed. The most common exception encountered were SegmentTruncatedException. A number of exceptions were reduced when the tools connectors to the Pravega store were upgraded.
The tool also improved immensely when diagnostic logging and spot bug fixes were made. These reduced the exceptions but the retries were still needed to give indication to the tester that the store was functional and that the tools subsequent requests went through. The retry logic was expanded to include numRetries and delayMillis between the retries from the user. These parameters could be read from the task configuration at the start of the tool.
Subsequently, restart logic was required to be added so that the readers could resume from the last position rather than from the beginning of the stream. This was solved with the help of checkpoints that was used to reset the reader group so that readers may come and go but the progress could be made from the last position. The checkpoints were added to readers but it was necessary a configurable parameter to the readers to indicate that the reader was restartable. This parameter was also added to the task configuration.
The Longevity tool runs on docker containers so there was no easy way to specify the retart as an argument to the tool after the tool was launched. A pair of apis were added to restart the readers and writers. This gave the ability to the tester to get past failures of the tools by bringing down and reviving the writers and readers.
Checkpoints meant that there were StreamCut positions that could now be used to reduce the segment range the readers need to work on. The range is specified as a pair of head and tail where the tail points to the current boundary to read the next event but the head could be adjusted to not be at the start of the stream. Since the segment ranges are logged, the tester could associate a point of time that the tool could be resumed from. This was added via a segment number and last position pair in the task configuration. A change to the configuration, during the execution, was a limitation with the tool. This was relaxed with the help of an API that could accept a new test configuration altogether and kick off the restart with the new parameters.

Saturday, April 25, 2020

Kotlin versus java continued...

Along with Collections, the Kotlin standard library comes with sequences. Sequences are like iterables but they are not processed eagerly. Instead they are processed lazily when the result of the chain is required. Sequences perform all the operations on one item at a time while iterable completes each step for the collection and then proceeds to the next step.

The Kotlin standard library offers two types of operations on collections – member functions and extension functions. Member functions define operations that are essential for a collection type. Implementations of collection interfaces must implement these member functions. Extension functions are for filtering, transforming, ordering and other collection processing functions.

Transformations can be seen with examples involving mapping which applies a lambda function applied to each subsequent element. The order of results follows the order of elements. Zipping is another transformation which builds pairs with the elements from the same position in both collections. Extra elements in either collection beyond the common positions are ignored. Flat access to nested collections are also transformations.

Kotlin is still behind C# in bringing SQL to language but Flink and Spark libraries make up for it. There is an ORM framework available natively for Kotlin called Exposed which has two layers of database access: typesafe SQL wrapping domain specific language and lightweight data access objects.

Friday, April 24, 2020

Java vs Kotlin continued:

The Kotlin language avoids checked exception. This makes it easier to write code without special handlers.

Ranges and progressions are easy to implement with Kotlin. For example, ascending can be specified as say 1..4 and descending can be specified as 4 downTo 1. Ranges include the sentinel values and are defined for comparable types that have an order. It is usually called in its operator form such as Number(1)..Number(4)

Collections are another example of Kotlin improvements. Kotlin collections allow us to manipulate a number of objects independently of the exact type of objects stored in them. Objects in a collection are called elements or items. The collections represented by Set, Map and List continue to hold the same relevance in Kotlin. The interfaces and related functions to access these are located in the kotlins.collections package

Kotlin collection types come with a standard pair of interfaces represented by 1) a read-only interface and a 2) mutable interface where the latter extends the former with write operations. All write operations modify the same mutable collection object, so the reference doesn’t change.

The read-only collection types are covariant where collections of base types can be used to pass around collections of derived types. Mutable collections are not covariant because the type safety cannot be enforced.

Collection<T> is the interface at the top level for read-only collection which includes retrieving size, checking item membership and others. All Collections are iterable.

Thursday, April 23, 2020

We discuss a data export tool for Kubernetes:
Data Export Tool:
When Applications are hosted on Kubernetes, they choose to persist their state using persistent volumes. The data stored on these volumes is available between application restarts. The storageclass which provides storage for these persistent volumes will be external to the pods and the container on which the application is running. When the tier 2 storage is nfs, the persistent volumes appear as mounted file system and this is usable with all standard shell tools including those for backup and export such as duplicity. The backups usually exist together with the source and as another persistent volume which can then be exposed to users via curl requests. Therefore, there is a two-part separation – one which involves an extract-transform-load between a source and destination and another that relays the prepared data to the customer.
Both can take arbitrary amount of data and prolonged processing. In the Kubernetes world, with arbitrary lifetime of pods and containers, this kind of processing becomes prone to failures. It is this special consideration that sets apart the application logic from traditional data export techniques. The ETL may be written in Java but a Kubernetes Job will need to be specified in the operator code base so that the jobs can be launched on user demand and survive all the interruptions and movements possible in the control plane of Kubernetes.
Kubernetes jobs run to completion. It creates one or more pods and as the pods complete, the job tracks the completions. The job has ownership of the pods so the pods will be cleaned up when the jobs are deleted. The job spec can be used to describe the job and usually requires the pod template, apiVersion, kind and metadata fields. The selector field is optional. Jobs may be sequential, parallel with a fixed completion count and parallel jobs as in a work queue – all of which are suitable for multi-part export of data.
Data Export from the Kubernetes data plane can be ensured to be on demand and associated with a corresponding K8s resource – custom or standard for visibility in the control plane.
An alternative technique to this solution is to enable a multipart download REST API that exposes the filesystem or S3 storage directly. This kind of pattern keeps the data transfer out of the Kubernetes control plane and exposed only internally which is then used from the user interface.
The benefits of this technique is that the actions are tied to the user interface-based authentication and all actions are on –demand. The trade-off is that the user interface has to relay the api call to another pod and it does not work for long downloads without interruptions.
Regardless of the preparation of the data to be streamed to the client behind an api call, it is better to not require relays in the data transfer. The api call is useful to make the request for the perpared data to be on demand and the implementation can scale to as many requests as necessary.

Wednesday, April 22, 2020

Kotlin vs Java reviewed:

Both Kotlin and Java are statically typed language. Kotlin is newer with official release in 2016 as opposed to official release in 1995. Languages based on JVM can be compiled to JavaScript. Kotlin requires a plugin and can work with existing Java stack.

Kotlin offers a number of advantages over Java. It is definitely terse and more readable. It overcomes Java’s limitations for null references that is controlled by the type system.

Kotlin is designed with Java interoperability and enables smooth calls to all methods and properties by following a convention that cuts down code. Since java objects can be null, all objects
originating from Java are treated as platform types and all safety guarantees are the same as in Java. Annotations help with providing nullability information for type parameters.

Kotlin uses Array as invariants which prevent assigning of a typed array into another of projected type. Primitive type arrays are maintained without boxing overhead.

It uses a family of function types that have a special notation corresponding to the signatures of the functions involving parameters and return values There is also support for Suspending functions.

Kotlin supports Single Abstract Method aka SAM conversions which are implemented as an interface with a single abstract method. Kotlin function literals can be automatically converted into implementations of Java interfaces with a single non-default method. This can be used to create instances of SAM interfaces.

Kotlin does not support checked exceptions. Many believe that checked exceptions lead to decreased productivity with no significant improvement to code quality.

The compiler can infer the function types for variables. A function type can be invoked the invoke operator. Inline functions provide flexible control.

Kotlin also provide ‘is’ and ‘as’ operators for type checking and casts. The former operator allows us to check whether an object conforms to a given type. The ‘as’ operator also called the infix operator, is used in unsafe casts.

The Kotlin language has plenty of new syntax that follow parallels other newer development language. For example, we can use var and val keywords where var is used for mutable properties and val is used for read-only properties. The getters and setters are provided by default

Kotlin allows implementations to be delegated via delegation pattern that replaces implementation inheritance with zero boilerplate code. A derived class can implement an interface by delegating all of its public members to a specified object. This is independent from overrides.

Type inference for variables and property types is automatic. New symbols, methods, keywords and constants make it very easy to declare and use variables.

Slight modification of a class does not require a new subclass. Instead, we can use object expressions and object declarations

Kotlin allows writing a companion object inside the class so that its members can be accessed using only the class name as a qualifier. Companion object is useful when we need to write a function without instantiating a class but has access to the internals of a class such as a factory method.

Classes that are used exclusively for data are called data classes and declared with the data keyword.
The compiler automatically creates methods such as equals(), hashcode(), toString(), copy(). These are formed based on the type parameters in the constructor. These classes come with a few restrictions such as they cannot be abstract, sealed or inner but some of these were even relaxed in versions subsequent to 1.1

The standard library provides Pair and Triple as data classes.

Kotlin is perhaps the first to provide a clean separation between readonly and mutable collections. The readonly provides an interface to the collection to access the elements of items. The mutable interface extends the read only interface with write access. This is makes it clearer to call out collections that are meant for reporting stacks and do not interfere with the operations ongoing with the existing collection.

Ranges and progressions are easy to implement with Kotlin. For example, ascending can be specified as say 1..4 and descending can be specified as 4 downTo 1. Ranges include the sentinel values and are defined for comparable types that have an order. It is usually called in its operator form such as Number(1)..Number(4)

Kotlin is therefore, more than a notation change from Java. It packs features that were not seen earlier with Java.

Tuesday, April 21, 2020

Kotlin vs Java continued...

Classes that are used exclusively for data are called data classes and declared with the data keyword.
The compiler automatically creates methods such as equals(), hashcode(), toString(), copy(). These are formed based on the type parameters in the constructor. These classes come with a few restrictions such as they cannot be abstract, sealed or inner but some of these were even relaxed in versions subsequent to 1.1

The standard library provides Pair and Triple as data classes.

Kotlin is perhaps the first to provide a clean separation between readonly and mutable collections. The readonly provides an interface to the collection to access the elements of items. The mutable interface extends the read only interface with write access. This is makes it clearer to call out collections that are meant for reporting stacks and do not interfere with the operations ongoing with the existing collection.

Kotlin is therefore, more than a notation change from Java. It packs features that were not seen earlier with Java.

The delegation pattern is a newer technique which allows a class Derived to implement an interface Base by delegating all of its public members to a specified object. Since the base implements the interface, the public methods are all available on the base. The derived object merely delegates it to the base. This delegation pattern has first class citizenship in the Kotlin language.

The class Derived also has the ability to override any delegation. It can do this on a method by method basis. Kotlin provides this ability to use class Derived independent from the Base. The delegate object has no visibility to the overridden methods. This behavior is valid for properties on the Derived class as well.

Monday, April 20, 2020

Kotlin allows implementations to be delegated via delegation pattern that replaces implementation inheritance with zero boilerplate code. A derived class can implement an interface by delegating all of its public members to a specified object. This is independent from overrides.
Type inference for variables and property types is automatic. New symbols, methods, keywords and constants make it very easy to declare and use variables.
Slight modification of a class does not require a new subclass. Instead, we can use object expressions and object declarations. Object expressions take an object parameter of an anonymous class usually derived from some type or types and overrides the methods associated with that type or types. Object declarations are used with singletons where the declaration is much simpler than in other languages. It uses the object keyword followed by the class name and the implementation.
Kotlin allows writing a companion object inside the class so that its members can be accessed using only the class name as a qualifier. Companion object is useful when we need to write a function without instantiating a class but has access to the internals of a class such as a factory method. It makes use of the object declaration syntax with the companion keyword. They look like static methods but are really instance methods and can implement interfaces. It is resolved when the corresponding class is loaded which is typical of lazy initialization for object declarations as opposed to immediate initialization for object expressions.
Classes that are used exclusively for data are called data classes and declared with the data keyword.
The compiler automatically creates methods such as equals(), hashcode(), toString(), copy(). These are formed based on the type parameters in the constructor. These classes come with a few restrictions such as they cannot be abstract, sealed or inner but some of these were even relaxed in versions subsequent to 1.1
The standard library provides Pair and Triple as data classes.
Kotlin is perhaps the first to provide a clean separation between readonly and mutable collections. The readonly provides an interface to the collection to access the elements of items. The mutable interface extends the read only interface with write access. This is makes it clearer to call out collections that are meant for reporting stacks and do not interfere with the operations ongoing with the existing collection.
Kotlin is therefore, more than a notation change from Java. It packs features that were not seen earlier with Java.