Cluster computing

Friday, August 14, 2020

Migration to java 11 continued

The following section now discusses a few APIs from the JDK 11, their benefits and the preparatory work.

The Arrays api includes a faster and more powerful data structure. All the methods for manipulating arrays including sorting and searching are available as earlier. The sorting algorithm used is Dual-Pivot Quicksort by Vladimir Yaroslavskiy, Jon Bentley and Joshua Bloch. This algorithm has the same O(nlogn) as in the family of quicksort algorithms but is faster than traditional ones for a broader workload.

The traditional algorithm consisted of a single pivot where all elements less than the pivot come before the element and all elements after the pivot come after it and the sub-arrays are recursively sorted.

With the dual pivot variation, there are two pivots which are chosen as say the first and the last element. The pivots have to be sorted otherwise they are swapped. The range between the pivots is broken down into non-overlapping sub-ranges denoted by sentinels at index L, K and G progressively between the far left and the far right.

The subrange between left+1 to L –1 have elements less than P1

The subrange between L to K-1 have elements >= P1 and <= P2

The subrange between K to G can have any arbitrary remaining elements

The subrange between G+1 to right-1 have elements > P2

This way the arbitrary elements in the third sub-range above is shrunk by placing the element in one of the other three sub-ranges after comparing it with the two pivots. The L, K and G are advanced as the third sub-range is shrunk.

The first pivot element is then swapped with the last element of the first sub-range above.

The second pivot element is then swapped with the first element of the last sub-range above.

The steps are repeated recursively for each sub-range.

The key thing to note here is that for relatively large arrays the complexity remains somewhat similar but everyday programmers generally use this algorithm for small array sizes. The authors took the approach that with a threshold for array length as 27, insertion sort is preferable over all other sorting methods. In JDK 8, this threshold was set to 47.

Thursday, August 13, 2020

Migration to java 11 continued

Application does not have to be rewritten unlike the migration when upgrading a framework. There are several comparisons that can still be made between the transition when upgrading the sdk versus upgrading the framework. These are described via the change of APIs in either case. The set of deprecated API is usually more for an application that is tightly bound to a framework with a larger number of breaking changes. There are no such requirements for sdk which usually has little breaking changes in the API. On the other hand, the effort to publish multiple jars is somewhat more convenient to maintain backward compatibility.

File structure layout for the code does not have to. Hange with the upgrade of the jdk bbut the same cannot be said to be true for the migration or upgrade of framework.

Finally. Build tools help a lot with the preparation for an upgrade of jdk. Gradle has a number of plugins which are determined at the outset and very early before the compilation of the code.

Gradle can optimize the loading and re-use of plugin classes, allow different plugins for different versions of classes and provide editors which detail information about the potential properties and values in the buildscript.

The plugins can also be extracted during compilation rather than require pre-installation. It might look like the traditional apply() method in gradle is no different from the plugins block and that they both serve to list plugins and their versions but the latter is actually more recent, has more rigorous checks, constraints and restrictions. If we want to avoid the restrictions, we could make use of the buildScript block. Gradle has support for multi-project builds so the build.gradle is composable for different projects

Wednesday, August 12, 2020

Migration to java 11

The following section determines the transition steps to take for different application given the explanation of tools and features in the preceding posts.

Jar based applications- These applications can be invoked directly from the command line. In addition, they handle web requests by including the communication within the package. Previously, they used to be handled by an application server. These applications are built with frameworks such as Spring Boot, DropWizard, Micronaut, MicroProfile and others and are packaged into jar files.

Spring Cloud microservices – Here the application consists of a suite of small services, each running its own process and communicating over HTTP usually. Each service is isolated and dedicated to a business value. Each application may consist of its own jar file and sometimes packaged as a fat jar.

Web applications – These run inside a servlet container. Some use servlet APIs directly while other frameworks hide them such as Apache Struts, Spring MVC and JavaServer Faces. These applications are packaged as WAR files

Enterprise applications – These applications were referred to as J2EE application and are now called Jakarta EE application. They are packaged as EAR files or sometimes as WAR files. They run on Java-EE compliant servers such as WebLogic, WebSphere, WildFly, GlassFish, Payara and others. If such applications are written to use Java EE features, they can migrate from one compliant server to another.

Batch / Scheduled jobs - These are jobs that are usually run with crontabs. They are packaged into archives with jar extension and run with schedulers such as Quartz and Spring Batch. These are usually isolated from application logic and their packaging as described in the types mentioned earlier.

All applications described above can run directly on a virtual machine. Some can be hosted on containers. The mode of deployment and the leverage of features available from the host or infrastructure to support them varies considerably on the choice. Applications can become slimmer on a better served host.

The runtime resource usage and needs for the above applications also varies from host to host by type. The average and peak requests may vary by time but the applications must have sufficient resources planned.

Tuesday, August 11, 2020

Support for small to large footprint introspection database and query

Introspection analytics

A Flink job might be dedicated to perform periodic analysis of introspection data or to collect information from sensors. The job can also consolidate data from other sources that are internal to the stream store and hidden from users.

Batching and statistics are some of the changes with which the analytics job can help. Simple aggregate queries per time window for sum(), min(), max() can help make more meaningful events to be persisted in the stream store.

The FlinkJob may have network connectivity to read events from external data stores and these could include events published by sensors. Usually those events make their way to the stream store irrespective of whether there is an introspection store or analytics job or not in the current version of the system. In some cases, it is helpful for the analytical jobs to glance at backlog and rate of accumulation in the stream store as overall throughput and Latency for all the components taken together. Calculation and persistence of such diagnostics events is helpful for trends and investigations later.

The use of Flink job dedicated to introspection store immensely improves the querying capability. Almost all aspects of querying as outlined in the Stream processing of Apache Flink by O’Reilly Media can be used for this purpose

Distributed Collection agents

As with any store not just introspection store, data can come from different sources for the destination as the store. Collection agents for each type of source make it convenient for the data to be transferred to the store

The collection agents do not themselves need to be monitored. The data they send can be lossy but it should arrive at the store. The store is considered a singleton local instance. It may not even be in the same system as the one it serves. The rest of the store may be global and shared but the data transfer from collection agent does not have to go directly to the global shared storage. If it helps to have the introspection store serve as the same local destination for all collection agents, the introspection store can be kept outside the global storage. In either case the streams are managed centrally by the stream store and the storage refers to tier 2 persistence.

Distributed Destinations:

Depending on the mode of deployment, the collection agents can be either lean or bulky. In the latter case, they come configured with their own storage so that all the events are batched under the resource restriction of the site where the agent is deployed. Those batched events can then be periodically pushed to the introspection store. This is rather useful when certain components of the system don’t even share the same cluster or host on which the streamstore is deployed. The collection agents are usually as close to the data source as possible so the design to keep them going regardless of whether the rest of the system is reachable or not is prudent given that certain sites might even be dark. Under such circumstance, the ability to propagate events collected remotely for introspection of data collection agents will be very helpful for administrators to use as and when they like.

Monday, August 10, 2020

Performance improvements with new methods in Java 11

Java 11 is an upgrade over Java 8. The only direction for software depending on this JDK 8 is forward to 11. There have been new methods added and some modifications to existing API which improve startup, performance and memory usage.

While this article talks about high level changes between 8 and 11, it elaborates on the transition between 8 to 11.

1) Java 11 supports modules. A module is a collection of classes, interfaces and resources. This reduces the footprint for the application and improves the customization of the runtime. The class loading is improved. The dependencies described by the developer for building the application become better encapsulated, secure and easier to maintain.

2) It includes better memory management and a low overhead heap profiler. Developers familiar with the Java mission control to view the memory usage find this an improvement over that available in 8.

3) Java 11 has common logging system and a flight recorder that gathers data from a running Java application. The data can then be analyzed using the Java Mission Control.

4) The Garbage collection has four options – serial, parallel, garbage-first and epsilon with the third being the default garbage collector in Java 11. The default is usually good enough for most application however mission critical applications find the advanced features very useful.

When transitioning from Java 8 to Java 11, an application encounters a few deprecated APIs, changes to class loaders, and changes to garbage collection. Yet an application that successfully compiles against the jdk 11 can continue to run on JRE with Java 8. Leveraging the new methods and features in Java 11 only improves the execution, performance and security.

A variety of tools serve to inspect the code prior and post this transition. The ‘jdeprscan’ tool looks for the use of deprecated or removed API. The `jdeps` tool analyzes the dependencies of the java application. The –jdk-internals option used with jdeps can inform the usage of internal apis that are subject to change with the transition. The Java dependency analysis tools has recommended replacements for commonly used JDK internal APIs. The Java compiler itself informs quite a bit with –debug option. As with the compiler, the jdeps and jdeprscan can only report warnings based on compilation. Runtime dependencies and reflection are excluded. The –add-opens and –add-reads option can be used to expose the encapsulated packages to the compiler. The best thing about Java 11 for packaging of code is that it can support multiple-releases. A multi-release jar is one that can support both Java 8 and Java 11. The “Multi-Release: true” directive in the jar section of the build script is usually sufficient for this purpose leading to a versioned directory in the form of “META-INF/version/N”

Most tools and particularly the java compiler have a lot of options that tweak the behavior of the tool. The Java compiler is specially used with these options to suit the needs for building the application. The JaCoLine tool helps detect problems with the command line options.

Sunday, August 9, 2020

Support for small to large footprint introspection database and query

Introspection analytics

Distributed Collection agents

Saturday, August 8, 2020

Tls certificate error

If you encounter this exception stack trace, take the steps that follow in this post:

java.util.concurrent.CompletionException: io.pravega.shared.protocol.netty.ConnectionFailedException: java.security.cert.CertificateException: No certificate data found
at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1592)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Caused by: io.pravega.shared.protocol.netty.ConnectionFailedException: java.security.cert.CertificateException: No certificate data found
at io.pravega.client.connection.impl.TcpClientConnection.createClientSocket(TcpClientConnection.java:261)
at io.pravega.client.connection.impl.TcpClientConnection.lambda$connect$1(TcpClientConnection.java:191)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
... 7 common frames omitted
Caused by: java.security.cert.CertificateException: No certificate data found
at sun.security.provider.X509Factory.parseX509orPKCS7Cert(X509Factory.java:456)
at sun.security.provider.X509Factory.engineGenerateCertificates(X509Factory.java:356)
at java.security.cert.CertificateFactory.generateCertificates(CertificateFactory.java:462)
at io.pravega.common.util.CertificateUtils.extractCerts(CertificateUtils.java:52)
at io.pravega.common.util.CertificateUtils.extractCerts(CertificateUtils.java:45)
at io.pravega.common.util.CertificateUtils.createTrustStore(CertificateUtils.java:92)
at io.pravega.client.connection.impl.TcpClientConnection.createFromCert(TcpClientConnection.java:211)
at io.pravega.client.connection.impl.TcpClientConnection.createClientSocket(TcpClientConnection.java:229)
... 9 common frames omitted

Generate a private RSA key

openssl genrsa -out diagserverCA.key 2048

Create a x509 certificate

openssl req -x509 -new -nodes -key diagserverCA.key \ -sha256 -days 1024 -out diagserverCA.pem

Create a PKCS12 keystore from private key and public certificate.

openssl pkcs12 -export -name server-cert \ -in diagserverCA.pem -inkey diagserverCA.key \ -out serverkeystore.p12

Convert PKCS12 keystore into a JKS keystore

keytool -importkeystore -destkeystore server.keystore \ -srckeystore serverkeystore.p12 -srcstoretype pkcs12 -alias server-cert

Import a client's certificate to the server's trust store.

keytool -import -alias client-cert \ -file diagclientCA.pem -keystore server.truststore

Import a server's certificate to the server's trust store.

keytool -import -alias server-cert \ -file diagserverCA.pem -keystore server.truststore

Note the PEM format is imported and not other formats.