Cluster computing

Saturday, April 18, 2020

Java versus Kotlin:
Kotlin brings a ton of new features over Java such as Lambda expressions, extension functions, smart casts, String templates, primary constructors, first-class delegation, type inferences, singletons, range expressions, operator overloading, companion objects and coroutines.
Lambda expressions are just like functions. Kotlin functions are first class which allow them to passed like parameters. A function that receives such parameters is a higher order function. A Lambda function can be instantiated within a function literal. An anonymous function has no name. Function types can be instantiated by callable reference.
The compiler can infer the function types for variables. A function type can be invoked the invoke operator. Inline functions provide flexible control.
Together lambda expressions and inline controls provide highly performant control structures. Next, even a class can be extended without having to inherit or using a decorator. This is done via extensions. The Extension functions are easy to spot with the ‘this’ parameter passed in. They are dispatched statically.
Kotlin also provide ‘is’ and ‘as’ operators for type checking and casts. The former operator allows us to check whether an object conforms to a given type. The ‘as’ operator also called the infix operator, is seldom used and often used implicitly with the ‘is’ operator making the casts a whole lot smarter. The infix operator is most likely used in unsafe casts.
Type safety for generics can be enforced as compile time with Kotlin, while at runtime instances of the runtime holds no information. The compiler prohibits type conformance where type erasure may occur.
String literals are another useful feature for Kotlin. A String literal may contain template expression which involves a piece of code usually beginning with a dollar sign.

Friday, April 17, 2020

Java versus Kotlin:
Both Kotlin and Java are statically typed language. Kotlin is newer with official release in 2016 as opposed to official release in 1995. Languages based on JVM can be compiled to JavaScript. Kotlin requires a plugin and can work with existing Java stack.
Kotlin offers a number of advantages over Java. It is definitely terse and more readable. It overcomes Java’s limitations for null references that is controlled by the type system. The NullPointerExceptions can be eliminated for the most part from the language with the help of Kotlin with some exceptions for overt calls and data consistency. Kotlin provides a safe call operator denoted by ‘?.’ that accesses the member of an instance only when the instance is not null.
Kotlin is designed with Java interoperability and enables smooth calls to all methods and properties by following a convention that cuts down code. Since java objects can be null, all objects originating from Java are treated as platform types and all safety guarantees are the same as in Java. Annotations help with providing nullability information for type parameters.
Kotlin uses Array as invariants which prevent assigning of a typed array into another of projected type. Primitive type arrays are maintained without boxing overhead.
It uses a family of function types that have a special notation corresponding to the signatures of the functions involving parameters and return values such as (A, B) -> C. This notation also support a receiver type where an object receives the parameter passed in. There is also support for Suspending functions. Kotlin supports Single Abstract Method aka SAM conversions which are implemented as an interface with a single abstract method. Kotlin function literals can be automatically converted into implementations of Java interfaces with a single non-default method. This can be used to create instances of SAM interfaces.
Kotlin does not support checked exceptions. Many believe that checked exceptions lead to decreased productivity with no significant improvement to code quality. In fact, some call it an outright mistake.
The above comparison makes it equally easy to enumerate what Java has that Kotlin does not. These include checked exceptions, primitive types that are not classes, static members, non-private fields, wildcard types and ternary operator.
Kotline brings a ton of new features over Java such as Lambda expressions, extension functions, smart casts, String templates, primary constructors, first-class delegation, type inferences, singletons, range expressions, operator overloading, companion objects and coroutines.
Sample have implementation: https://1drv.ms/w/s!Ashlm-Nw-wnWrwRgdOFj3KLA0XSi

Thursday, April 16, 2020

Events are preferred to be generated once even if they go to different destinations. This is the principle behind the appender technique used in many software development project. This technique is popularly applied to logging where the corresponding library is called log4j. Software components write to log once regardless of the display or the storage of the logging entries. An appender is simply a serializer of events with the flexibility to send to different destinations simultaneously. These entries are sent to the console or file or both usually. With the popularity of web accessible blobs and continuously appended streams, we have new log4j destinations.
The implementation of the custom appender involves extending the well-known logback AppenderBase class and overrides the methods to start, stop and doAppend which takes the data to be appended to the target. The start and stop help with methods to initialize the writer to the stream store and for proper cleanup when the jar is unloaded. In terms of a data structure, this is the equivalent of the initialization of a data structure and the method to add entries to the data structure. The latter is the actual logic of handling an event. Usually this appender is annotated as a Plugin and the registration method is annotated with the PluginFactory decoration
The Appender is a runtime dependency usually or if necessary, as a compile time dependency such as when certain properties might be set on the appender only via code not declaration. The bean for the appender describes all the properties required for the start() method to succeed. For example, it defines the stream name, the scope name and the controller URI. These parameters alone are sufficient to instantiate the Appender.
Finally, the append method of the appender will invoke the write event on the writer. This involves making a write call to the stream store. Sophisticated implementations can allow logfilters and caches to be implemented with this appender. It can also allow asynchronous processing of the entries. The base class for the appender implementation was chose as AppenderBase but it can extend other derived logback appender classes as appropriate including the ones that help with Asynchronous processing.

Tuesday, April 14, 2020

This is a code sample for the suggestion made in the article about writing log4j appenders for stream store:
public class StreamAppender extends AppenderBase<ILoggingEvent> {
private static final String COMPLETION_EXCEPTION_NAME = CompletionException.class.getSimpleName();
private static final String EXECUTION_EXCEPTION_NAME = ExecutionException.class.getSimpleName();
private final static String CONTROLLER_URI = "CONTROLLER_URI";
private final static String SCOPE_NAME = "SCOPE_NAME";
private final static String STREAM_NAME = "STREAM_NAME";

public String scope;
public String streamName;
public URI controllerURI;
private StreamManager streamManager;
private EventStreamClientFactory clientFactory;
private EventStreamWriter<String> writer;

public StreamAppender(String scope, String streamName, URI controllerURI) {
this.scope = scope;
this.streamName = streamName;
this.controllerURI = controllerURI;
}

@Override
public void start() {
final String scope = getEnv(SCOPE_NAME);
final String streamName = getEnv(STREAM_NAME);
final String uriString = getEnv(CONTROLLER_URI);
final URI controllerURI = URI.create(uriString);

this.scope = scope;
this.streamName = streamName;
this.controllerURI = controllerURI;
init();
super.start();
}

@Override
public void stop() {
if (writer != null) writer.close();
if (clientFactory != null) clientFactory.close();
if (streamManager != null) streamManager.close();
super.stop();
}

private static String getEnv(String variable) {
Optional<String> value = Optional.ofNullable(System.getenv(variable));
return value.orElseThrow( () -> new IllegalStateException(String.format("Missing env variable %s", variable)));
}

private void init() {
StreamManager streamManager = StreamManager.create(controllerURI);

StreamConfiguration streamConfig = StreamConfiguration.builder()
.scalingPolicy(ScalingPolicy.fixed(1))
.build();
streamManager.createStream(scope, streamName, streamConfig);
clientFactory = EventStreamClientFactory.withScope(scope, ClientConfig.builder().controllerURI(controllerURI).build());
writer = clientFactory.createEventWriter(streamName,
new JavaSerializer<String>(),
EventWriterConfig.builder().build());
}

//region Appender Implementation

@Override
public String getName() {
return "Stream Appender";
}

@Override
public void append(ILoggingEvent event) throws LogbackException {
if (event.getLevel() == Level.ERROR) {
recordEvent("error", event);
} else if (event.getLevel() == Level.WARN) {
recordEvent("warn", event);
}
}

private void recordEvent(String level, ILoggingEvent event) {
IThrowableProxy p = event.getThrowableProxy();
while (shouldUnwrap(p)) {
p = p.getCause();
}
if (writer != null) {
writer.writeEvent(level, event.getMessage());
}
}

private boolean shouldUnwrap(IThrowableProxy p) {
return p != null
&& p.getCause() != null
&& (p.getClassName().endsWith(COMPLETION_EXCEPTION_NAME) || p.getClassName().endsWith(EXECUTION_EXCEPTION_NAME));

}

//endregion

}

The sample above refers to opening the stream each time to write an event. This may be avoided with doing it once on the instantiation of the bean.

Monday, April 13, 2020

Data traffic generators:
Storage products require a few tools to test how the products behave under load and duress. These tools require varying types of load to be generated for read and write. Standard random string generators can be used to create such data to store in files, blobs or streams given a specific size of content to be generated. The tool has to decide what kind of payload aka events to generate and employ different algorithms to come up with such load.
These algorithms can be enumerated as:
1) Constant size data traffic: The reads and writes are of uniform size and they are generated in burst mode where a number of packets follow in quick succession filling the pipeline between the source and destination.
2) Hybrid size data traffic: The constant size events are still generated but there are more than one constant size generators for different sizes and the events generated from different constant size generators are serialized to fill the data pipeline between the source and the destination. The different size generators can be predetermined for t-shirt size classification.
3) Constant size with latency: There is a delay introduced between events so that the data does not arrive at predictable times. The delays need not all be uniform and can be for random duration. While 1) allows spatial distribution of data, 3) allows temporal distribution of data.
4) Hybrid size with latency: There is a delay introduced between events from different generators as it fills the pipeline leading to both the size and the delay to vary randomly simulating the real-world case for data traffic. While 2) allows spatial distribution of data, 4) allows temporal distribution of data.
The distribution of size or delay can use a normal distribution which leads to the middle values of the range to occur somewhat more frequently than the outliers and a comfortable range can be picked for both the size and the delay to vary. Each event generator implements it strategy and the generators can be switched independently by the writer so that different loads are generated. The tool may run forever which means they do not need to stop unless interrupted.
The marketplace for tools already has quite a few examples for this kind of load generation and are referred to as packet generators, T-Rex for data traffic, or torture tools for driving certain file system protocols. Most of these tools however do not have an independent or offloaded load generator and are tied to the tool and purpose they are applied for limiting their usage or portability to other applications
One of the best advantages of separating event generation into its own library is that they can be used in conjunction with a log appender so that the target can vary at runtime. The target can be console if the data merely needs to appear on the screen without any persistence, or it can be a file, blob or stream. The appender also allows events to be written simultaneously to different targets leading to a directory of different sized files or a bucket full of different sized objects and so on. This allows other tools to work in tandem with the event generators as upstream and downstream systems. For example, duplicity may take the events generated as input for subsequent data transfer from a source to destination.
Sample code for event generator is included here: https://github.com/ravibeta/JavaSamples/tree/master/EventGenerator

Sunday, April 12, 2020

Writing a unified Log4j appender to Blobs, Files and Streams:
Events are preferred to be generated once even if they go to different destinations. This is the principle behind the appender technique used in many software development project. This technique is popularly applied to logging where the corresponding library is called log4j. Software components write to log once regardless of the display or the storage of the logging entries. An appender is simply a serializer of events with the flexibility to send to different destinations simultaneously. These entries are sent to the console or file or both usually. With the popularity of web accessible blobs and continuously appended streams, we have new log4j destinations.
This article explains how to write an appender for blobs, files and streams at the same time:
The first step is the setup. This involves specifying a configuration each for blob, stream and file. Each configuration is an appender and a logger. The set of configurations appear as a collection. The appender describes the destination, its name, target and pattern/prefix to be used. It can also include a burst filter that limits the rate. The Logger determines the verbosity and characteristics of the log generation. Specifying this configuration each for blob, file and stream makes up the setup step.
The second step is the implementation of the custom appender which we may need to pass in to the application as a runtime dependency usually or if necessary, as a compile time dependency such as when certain properties might be set on the appender only via code not declaration. The custom appender extends the well-known log4j AbstractAppender class and implements the method to register itself as an appender alongwith the method to take data to be appended to the target. In terms of a data structure, this is the equivalent of the initialization of a data structure and the method to add entries to the data structure. The latter is the actual logic of handling an event. Usually this appender is annotated as a Plugin and the registration method is annotated with the PluginFactory decoration
With these two steps, the appender is ready to be used with an application that wants to log to blob, file and stream all at the same time. This application will refer to the appender in its configuration by the same name that the plugin annotation was defined with. The logger defines the logging level to be used with this appender.
The above listed only the basic implementation. The appender can be made production ready with the help of improving reliability with error handling. It can override the custom error handler in this case which generates errors for the appender such as when used with a logging level less than that specified in the configuration.
Finally, the appender should follow the architectural standard set by the Application Block paradigm so that the implementation never interferes with the functionality of the applications generating the events.
Events can easily be generated with: https://github.com/ravibeta/JavaSamples/tree/master/EventGenerator

Saturday, April 11, 2020

Considerations in testing software deployed to Minikube

Products that are hosted on Minikube are expected to work the same as if they were deployed on any other Kubernetes container orchestration framework. This remains the case when the minikube deployments are large. When they are small, they are more prone to failures. Additional validations are required for small deployments since they are not at par with a fully deployed instance on any cluster.

The Minikube hosting feature is just like any other hosting platform such as AWS. It adds another dimension to existing test cases that involve a change of host on which the product runs.

The product may support both install and upgrade. The new test cases will usually target install but they can be expanded to include upgrade as well.

The product upgrade paths may vary if it is a patch or major version upgrade. Both paths will usually be supported on Minikube. This remains true for both small and large deployments of Minikube.

Minikube’s support is usually host facing as opposed to the external world. It requires a docker registry that is local or reachable from the pods to be able to pull the images. The use of local docker registry with or without tls is an important consideration.

The small deployments of Minikube should target lower number of replicas and containers. The specification of cpu, memory and disk for the whole host does not necessarily lower the size of various clusters used in this feature. A minimal-dev-values file is provided as guidance for lowering the number of replicas and containers and their consumption.

Access to the cluster should be tested from both User Interface as well as with Kubectl commands. If the product is used to create namespaces for the users, testing may include one or two projects on Minikube because this is the typical case. A large number of namespaces is neither required nor supported on small deployments due to resource constraints.

Error messages on using the user interface will not be tweaked since it is the same for the product across deployments regardless of size or flavor. A few negative test cases targeting error messages could be helpful.

Security configuration of the Minikube installation is lower priority since instance is already completely owned. Still it might be better to test with the pre-install script provided in the scripts directory. Options for Pravega security such as setting tls on pravega controller can also be exercised.

Small Minikube deployments are expected to have frequent restarts due to low resources. They should not number into hundreds. A test case that allows the Minikube deployment to run for some time will be helpful in this case.

If the product deployed on minikube hosts user code, then that code should be tweaked to utilize resources within reasonable limits for what is available to the product.

These are some of the considerations that make deployment validations different on Minikube.