Cluster computing

Tuesday, October 1, 2019

This is a continuation of the previous post to enumerate funny software engineering practice:

Build a product without compatibility features that require significant re-investment in downstream systems.

Build a product without automation capabilities and have customers try to reinvent the wheel.

Build a product with the focus on development and leaving the test on the other side of the wall.

Build a product that forces consumers to adopt a technology stack or a cloud vendor so that the subscriptions may accrue income for the maker

Build a product that works weird on the handheld devices

Build a product that does not customize user experience or carry over their profile.

Build a product that makes the user repeatedly sign-in on the same device

Build a product that makes users jump through login interface for each and every site they visit

Build a product that breaks on different browsers or clients with no workarounds for functionality.

Build a product with a base that opens as many vulnerabilities as a swiss cheese.

Build a product with a base that takes many dependencies and breaks when any one of them does

Build a product as a platform over all participating vendor technologies to improve customer experience but be blamed when the defects originate actually from vendors.

Build a product as a platform with plugins but the customer always sees a mix of code rather than a homogeneous consistent product.

Build a product on open source only to incur way more cost than anticipated.

Build a product with heavy investments on User Interface and see the target date inevitably moved out.

Build a product with a campaign and have technical debt incurred to keep the architecture sound.

Build a product that causes data loss or unavailability on updates and upgrades.

Build a product that serves a million customers but gets poor reputation when a hacker finds a vulnerability

Build a product that takes man-months to ship out the door only to be told that the features are not baked enough.

Monday, September 30, 2019

This is a continuation of the previous post to enumerate funny software engineering practice:

Build a product with little or no documentation on those hidden features that only the maker knows

Build a product which cause impedance mismatch between the maker and the buyer in terms of what the product is expected to do

Build a product that breaks existing usage and compatibility in the next revision.

Build a product with a revision that makes partner’s integration investments a total loss.

Build a product without compatibility features that require significant re-investment in downstream systems.

Build a product without automation capabilities and have customers try to reinvent the wheel.

Build a product with the focus on development and leaving the test on the other side of the wall.

Build a product that forces consumers to adopt a technology stack or a cloud vendor so that the subscriptions may accrue income for the maker

Build a product that works weird on the handheld devices

Build a product that does not customize user experience or carry over their profile.

Build a product that makes the user repeatedly sign-in on the same device

Build a product that makes users jump through login interface for each and every site they visit

Build a product that breaks on different browsers or clients with no workarounds for functionality.

Sunday, September 29, 2019

This is a continuation of the previous post to enumerate funny software engineering practice:

Build a product that requires different handlers in different roles driving up costs

Build a product that propagates pain and frustration through partners and ecosystem as they grapple with integration

Build a product that requires significant coding for automations and end users driving down adoption

Build a product that generates more support calls to do even the ordinary

Build a product that generates a lot of sprawl in disk and network usage requiring infrastructure 24x7 watch

Build a product that cannot be monitored because it is a black box

Build a product that says one thing but does another thing

Build a product that requires administrator intervention for authorizing each and every end user activity

Build a product that has an overwhelming size for code churn and improving technical debt

Build a product so big that changes in one component takes hours to integrate with other components

Build a product so big that it involves hierarchical permission requests before corresponding changes can be made in another component

Build a product with significant developer productivity but at the cost of business and market requirements.

Build a product to satisfy the technical managers but fail the product management.

Build a product with little or no documentation on those hidden features that only the maker knows

Build a product which cause impedance mismatch between the maker and the buyer in terms of what the product is expected to do

Build a product that breaks existing usage and compatibility in the next revision.

Saturday, September 28, 2019

Humour behind Software engineering practice
This is an outline for a picture book, possibly with caricatures, to illustrate frequently encountered humour in the software engineering industry:
1) Build a Chimera because different parts evolve differently
2) Build a car with the hood first rather than the engine
3) Build a product with dysfunctional parts or with annoying and incredibly funny messages
4) Build a product with clunky parts that break down on the customer site
5) Build a product that forces you to twist your arms as you work with it
6) Build a product that makes you want to buy another from the same maker
7) Build a product that highly priced to justify the cost of development and have enterprise foot the bill
8) Build a product that features emerging trends and showcases quirky development rather than the use cases
9) Build a product that requires more manuals to be written but never read
10) Build a product that requires frequent updates with patches for security
11) Build a product that causes outages from frequent updates
12) Build a product that snoops on user activity or collects and send information to the maker
13) Build a product that won’t play well with others to the detriment of the customer buying the software
14) Build a product that says so much in legal language that the it amounts to extortion
15) Build a product that requires the customer to pay for servicing.
16) Build a product that splashes on the newspapers when a security flaw is found.
17) Build a product that finds a niche space and continues to grow by assimilating other products and results in huge debt towards blending the mix
18) Build a product with hard to find defects or those that involve significant costs and pain
19) Build a product with ambitious goals and then retrofit short term realizations
20) Build a product with poorly defined specification that causes immense code churn between releases causing ripples to the customer
21) Build a product with overly designed specification that goes obsolete by the time it hits the shelf.
22) Build a product that fails to comply with regional and global requirements and enforcements causing trouble for customers to work with each other
23) Build a product that creates a stack and ecosystem dedicated to its own growth and with significant impairment for partners and competitors to collaborate
24) Build a product that impairs interoperability to drive business
Software engineers frequently share images that illustrate their ideas, feelings and mechanisms during their discussions in building the product. Some of these images resonate across the industry but no effort has yet been undertaken to unify a comic them and present it as a comic strip. Surprisingly, there is no dearth of such comic books and illustrations in parenting and those involving kids.

Friday, September 27, 2019

Performance Tuning considerations in Stream queries:
Query language and query operators have made writing business logic extremely easy and independent of the data source. This suffices for the most part but there are a few cases when the status quo is just not enough. Enter real-time processing needs and high priority queries, the size of the data, the complexity of the computation and the latency of the response begins to become a concern.
Databases have had a long and cherished history in encountering and mitigating query execution responses. However, relational databases pose a significantly different domain of considerations as opposed to NoSQL storage primarily due to the layered and interconnected data requiring scale up rather than scale out technologies. Both have their independent performance tuning considerations.
Stream storage is no different to suffer from performance issues with disparate queries ranging from small to big. The compounded effect of append only data and stream requiring to be evaluated in windows makes iterations difficult. The processing of the streams is also exported out of the storage and this causes significant round trip time and back and forth.
Apache stack has significantly improved the offerings on the stream processing. Apache Kafka and Flink are both able to execute with stateful processing. They can persist the states to allow the processing to pick up where it left off. The states also help with fault tolerance. This persistence of state protects against failures including data loss. The consistency of the states can also be independently validated with a checkpointing mechanism also available from Flink. The checkpointing can persist the local state to a remote store. Stream processing applications often take in the incoming events from an event log. This event log therefore stores and distributes event streams which are written to durable append only log on tier 2 storage where they remain sequential by time. Flink can recover a stateful streaming application by restoring its state from a previous checkpoint. It will adjust the read position on the event log to match the state from the checkpoint. Stateful stream processing is therefore not only suited for fault tolerance but also reentrant processing and improved robustness with the ability to make corrections. Stateful stream processing has become the norm for event-driven applications, data pipeline applications and data analytics application.
Persistence of streams for intermediate executions helps with reusability and improves the pipelining of operations so the query operators are small and can be executed by independent actors. If we have equivalent of lambda processing on persisted streams, the pipelining can significantly improve performance earlier where the logic was monolithic and proved slow from progressing window to window. There is no distinct thumb rule but the fine-grained operators have proven to be effective since they can be studied.
Streams that articulate the intermediary result also help determine what goes into each stage of the pipeline. Watermarks and savepoints are similarly helpful This kind of persistence proves to be a win-win situation for parallelizing as well as subsequent processing while disk access used to be costly in dedicated systems. There is no limit to the number of operators and their scale to be applied on streams so proper planning mitigates the efforts need to choreograph a bulky search operation.
These are some of the considerations for the performance improvement of stream processing.

Thursday, September 26, 2019

Since Event is a core Kubernetes resource much as the same as others, it not only has the metadata with it just like other resources, but also can created, updated and deleted just like others via APIs.

func recordEvent(sink EventSink, event *v1.Event, patch []byte, updateExistingEvent bool, eventCorrelator *EventCorrelator) bool {

newEvent, err = sink.Create(event)

}

Events conform to append only stream storage due to the sequential nature of the events. Events are also processed in windows making a stream processor such as Flink extremely suitable for events. Stream processors benefit from stream storage and such a storage can be overlaid on any Tier-2 storage. In particular, object storage unlike file storage can come very useful for this purpose since the data also becomes web accessible for other analysis stacks.

As compute, network and storage are overlapping to expand the possibilities in each frontier at cloud scale, message passing has become a ubiquitous functionality. While libraries like protocol buffers and solutions like RabbitMQ are becoming popular, Flows and their queues can be given native support in unstructured storage. Messages are also time-stamped and can be treated as events

Although stream storage is best for events, any time-series database could also work. However, they are not web-accessible unless they are in an object store. Their need for storage is not very different from applications requiring object storage that facilitate store and access. However, as object storage makes inwards into vectorized execution, the data transfers become increasingly fragmented and continuous. At this junction it is important to facilitate data transfer between objects and Event and it is in this space that Events and object store find suitability. Search, browse and query operations are facilitated in a web service using a web-accessible store.

Query in Event Stream is easier with Flink than using own web service to enumerate the DataSet:

package org.apache.flink.examples.java.eventcount;

import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.api.java.utils.ParameterTool;
import org.apache.flink.examples.java.wordcount.util.WordCountData;
import org.apache.flink.util.Collector;

public static void main(String[] args) throws Exception {

final ParameterTool params = ParameterTool.fromArgs(args);
final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
env.getConfig().setGlobalJobParameters(params);

DataSet<String> text;
text = env.readTextFile(params.get("input"));

DataSet<Tuple2<String, Integer>> counts =
text.flatMap(new Tokenizer())
.groupBy(0)
.sum(1);
counts.print();
}

public static final class Tokenizer implements FlatMapFunction<String, Tuple2<String, Integer>> {
@Override
public void flatMap(String value, Collector<Tuple2<String, Integer>> out) {
String hours = getSubstring(entry, "before");
out.collect(new Tuple2<>(hours, 1));
}
}

Wednesday, September 25, 2019

Logs may also optionally be converted to events which can then be forwarded to event gateway. Products like badger db are able to retain the events for a certain duration.

Components will then be able to raise events like so:
c.recorder.Event(component, corev1.EventTypeNormal, successReason, successMessage)

An audit event can be similarly raised as shown below:
ev := &auditinternal.Event{
RequestReceivedTimestamp: metav1.NewMicroTime(time.Now()),
Verb: attribs.GetVerb(),
RequestURI: req.URL.RequestURI(),
UserAgent: maybeTruncateUserAgent(req),
Level: level,
}
With the additional data as:
if attribs.IsResourceRequest() {
ev.ObjectRef = &auditinternal.ObjectReference{
Namespace: attribs.GetNamespace(),
Name: attribs.GetName(),
Resource: attribs.GetResource(),
Subresource: attribs.GetSubresource(),
APIGroup: attribs.GetAPIGroup(),
APIVersion: attribs.GetAPIVersion(),
}
}

Since Event is a core Kubernetes resource much as the same as others, it not only has the metadata with it just like other resources, but also can created, updated and deleted just like others via APIs.

func recordEvent(sink EventSink, event *v1.Event, patch []byte, updateExistingEvent bool, eventCorrelator *EventCorrelator) bool {
:
newEvent, err = sink.Create(event)
:
}
Events conform to append only stream storage due to the sequential nature of the events. Events are also processed in windows making a stream processor such as Flink extremely suitable for events. Stream processors benefit from stream storage and such a storage can be overlaid on any Tier-2 storage. In particular, object storage unlike file storage can come very useful for this purpose since the data also becomes web accessible for other analysis stacks.

As compute, network and storage are overlapping to expand the possibilities in each frontier at cloud scale, message passing has become a ubiquitous functionality. While libraries like protocol buffers and solutions like RabbitMQ are becoming popular, Flows and their queues can be given native support in unstructured storage. Messages are also time-stamped and can be treated as events
Although stream storage is best for events, any time-series database could also work. However, they are not web-accessible unless they are in an object store. Their need for storage is not very different from applications requiring object storage that facilitate store and access. However, as object storage makes inwards into vectorized execution, the data transfers become increasingly fragmented and continuous. At this junction it is important to facilitate data transfer between objects and Event and it is in this space that Events and object store find suitability. Search, browse and query operations are facilitated in a web service using a web-accessible store.