Cluster computing

Monday, January 20, 2020

The Jenkins automation to read upstream job information is made possible with APIs. For example:

for (String jobName : jobNames.keySet()) {

def imageName = jobNames.get(jobName)

def uri = "https://<jenkinsServer>/job/${jobName}/lastSuccessfulBuild/api/json"

def buildJson = ["wget", "-qO-", "${uri}"].execute().text

def start = buildJson.indexOf('"description":"')

def end = buildJson.indexOf('"', start+15)

def remoteBuildVersion = ""

println("${start}:${end}")

if (start != -1 && end != -1 && end > start) {

remoteBuildVersion = buildJson.substring(start+15, end)

}

manifestFileLines.each { line ->

if (line.contains("${imageName}:")) {

localBuildVersion = line.split(":")[1]

if (remoteBuildVersion != localBuildVersion) {

pattern=remoteBuildVersion

replacement=localBuildVersion

println("replacing ${pattern} with ${replacement} in ${imageName}")

filesToBeModified.eachFileRecurse(

{file ->

fileText = file.text;

def backupFile = file.path + ".bak"

writeFile(file: backupFile, text: fileText)

fileText = fileText.replaceAll(pattern, replacement)

writeFile(file: file.path, text: fileText)

})

} else {

println("${jobName} build version ${remoteBuildVersion} matches ${localBuildVersion}")

}

Note the above script is specific to Jenkinsfile and avoids popular groovy syntax even though groovy can be used in Jenkinsfile otherwise we might see errors such as : “Scripts not permitted to use staticMethod org.codehaus.groovy.runtime.DefaultGroovyMethods execute java.util.List. Administrators can decide whether to approve or reject this signature.

org.jenkinsci.plugins.scriptsecurity.sandbox.RejectedAccessException: Scripts not permitted to use staticMethod org.codehaus.groovy.runtime.DefaultGroovyMethods execute java.util.List”

Sunday, January 19, 2020

There are many ways to use Configuration annotations in Spring Java application. @Configuration annotation is not the same as a bean. A POJO object defined as a configurationproperties can be imported into an @configuration.
For example,

The proper annotations to use with Kubernetes secrets in Spring Java application:

@Configuration

@Primary

@EnableWebSecurity

@EnableConfigurationProperties(KubernetesSecrets.class)

public class WebConfig extends WebSecurityConfigurerAdapter {

private static final Logger LOG = LoggerFactory.getLogger(WebConfig.class);

KubernetesSecrets secrets;

public WebConfig(KubernetesSecrets secrets) {

this.secrets = secrets;

}

Here the KubernetesSecrets object is a ConfigurationProperty which will get its values externally.
When a new instance of the KubernetesSecrets is created as part of this class in a method, that method will have an @Bean annotation. The same annotation does not hold for the above member variable and constructor

The differences between a Configuration, ConfigurationProperty and Bean are somewhat unclear when they are used merely to refer to an external source. The ConfigurationProperty is only a way to tell that these properties will have values defined externally. The Configuration object is defined on an object that uses ConfigurationProperties.

Please note that we don't use @Autowired for the member variable or the constructor above. If it were a @Bean that would have been appropriate.

A configuration is essential for context initialization which in turn helps the SpringBootApplication with initialization.

The @Bean is used to declare a single bean. Spring does it automatically when a @Component, @Service, or an @Repository is used because they come from classpath scanning.

Saturday, January 18, 2020

Communication Protocols between independent programs – a comparisons of gRPC versus REST.
The popularity of web protocols has increased over the last decade because it helps connect heterogeneous applications and services that can be hosted anywhere. There are two popular protocols gRPC and REST. We will use their abbreviations as is with their comparisons as follows:
REST
This is a way of requesting resources from the remote end via standard verbs such as GET, PUT etc.
The advantages are:
• Requires HTTP/1.1
• Supports subscription mechanisms with REST hooks
• Comes with widely accepted tool and browser support
• Well defined road to development of the service that provides this communication
• Supports discovery of resource identifiers with subsequent request response models.
• Is supportive of software development kit where more than one language can be supported for the use of these communication interfaces.
The disadvantages are:
• Is considered chatty because there are a number of requests and responses
• Is considered heavy because the payload is usually large.
• Is considered inflexible at times with versioning costs
gRPC:
This is a way of requesting resources from the remote end because the application by processing routines rather than asking for resources. Routines are the equivalent of verbs and resources and some treat this communication as a refinement of RPC and SOAP which were protocols that are now considered legacy.
The advantages are:
• Supports high speed communication because it is lightweight and does not require the traversal of stack all the way up and down the networking layers.
• The messages are over “Protocol Buffer” which is known for being efficient in packing and unpacking data
• It works over newer HTTP/2
• Best for traffic from devices (IoT)
The disadvantages are
Requires client to write code
Does not support browser
Both REST and gRPC support secure transport layer communication which makes communication between two parties as private. When corporations make significant investment in the development of each, they tend to be a choice for development teams. However, supporting both communication protocol only widens the audience and does not have to be mutually exclusive given enough resources and time. They also broaden the customer base.
Sample implementation: https://github.com/ravibeta/pravega
https://travis-ci.com/ravibeta/pravega

Friday, January 17, 2020

Ideas for a graceful shutdown of an application hosted on Kubernetes orchestration framework (K8s) continued...

8. Eighth, there are special capabilities with statefulset which include the following:

a. They can be used to create replicas when pods are being deployed. The pods are created sequentially in order from 0 to N-1 When the pods are deleted, they are terminated in the reverse order

b. They can be used to created ordered and graceful scaling. All of the predecessors are ensured to be ready and running prior to scaling

c. Before a pod is terminated, all of its successors must be completely shut down.

The above set of guarantees is referred to as the “OrderedReady” pod management.

There is also parallel pod management which does not chain the pods.

Statefulset can also be used to perform rolling updates. This is one case where healthy pods may be terminated. Kubernetes slowly terminates old pods while spinning up new ones. If a node is drained, Kubernetes terminates all the pods on that node. If a node runs out of resources, pods may be terminated to free some resource. While we discussed SIGTERM and preStop hook, we have not discussed an appropriate limit for the terminationGracePeriodSeconds on the pod spec. This is typically set to 30 or 60 seconds but it merely has to be greater than the duration of running all the chained handlers for the termination messages

Please note that the use of “lifecycle: command: ” scripts in postStart and preStop. These should ideally not use “/bin/sh -c” because they don’t pass messages. It is preferable to either use dumb-init or actual executable that handles ^C event.
When a software product comprises of multiple independent applications, each application may get a message from the infrastructure. The application then handles the message as appropriate regardless of who sent the message. However, applications also tend to have coordinators in a cluster-based deployment model. In such a case, the coordinator might know a better way to gracefully shutdown the application. For example, “./bin/flink stop” is a better way to shut down a long running analytical application. This gives the chance for the coordinator to relay any additional commands along with the shutdown and the application to piggy back a suitable response to the coordinator. The infrastructure message then takes a form of communication in the layer above that participating applications and coordinator knows best how to handle. The distributed model is especially beneficial for graceful shutdown because different roles in the cluster can now share the prepartion chores for the shutdown suitable to that application or globally. In such cases, the cleanup also provides an opportunity to save state for better and more efficient post shutdown activities.

Finally, a software product can choose to alleviate inefficiencies in the distribution of termination messages by providing one publisher and one subscriber model. The publisher will inevitably be the infrastructure while the subscriber will be the component of the product. The termination message is always an interrupt and will be most efficiently routed to a single destination which can guarantee a graceful shutdown. Efficiency in this case is not as much about cost from communication as it is about increasing reliability and data-safety during the graceful shutdown procedure by doing the necessary minimal.

These are some of the techniques used for the purpose of a graceful shutdown of an application hosted on the Kubernetes orchestration framework.

#Apache flink split events into windows:

.window(EventTimeSessionWindows.withGap(Time.milliseconds(1)))

.allowedLateness(Time.milliseconds(1))

Is used to separate the events into windows.