Cluster computing

Saturday, December 7, 2019

Sample code signing with Microsoft tools:
Step 1. Install Microsoft Windows SDK suitable to the desktop Windows version.
Step 2. Check that the following two files exists:
C:\Program Files (x86)\Windows Kits\10\bin\10.0.18362.0\x64\makecert.exe
C:\Program Files (x86)\Windows Kits\10\bin\10.0.18362.0\x64\pvk2pfx.exe
Step 3. MakeCert /n "CN=My Company Name, O=My Company, C=US" /r /h 0 /eku "1.3.6.1.5.5.7.3.3,1.3.6.1.4.1.311.10.3.13" /e 12/30/2025 /sv \EMCCerts\testcert.out \EMCCerts\testcert.cer
Succeeded
Step 4. Pvk2Pfx /pvk \EMCCerts\testcert.out /pi <password> /spc \EMCCerts\testcert.cer /pfx \EMCCerts\testcertNew.pfx /po <password>
Step 5. Certutil -addStore TrustedPeople \EMCCerts\testcert.cer
TrustedPeople "Trusted People"
Signature matches Public Key
Certificate "DellEMC Streaming Data Platform" added to store.
CertUtil: -addstore command completed successfully.
Step 6. signtool.exe sign /fd SHA256 /a /f \EMCCerts\testcertNew.pfx /p <password> \my-file-to-sign.zip
Step 7. makeappx.exe pack /f \unsigned\Appx.map /p \Signed\my-file-to-sign.zip
Microsoft (R) MakeAppx Tool
Copyright (C) 2013 Microsoft. All rights reserved.

The path (/p) parameter is: "\\?\Signed\my-file-to-sign.appx"
The mapping file (/f) parameter is: "unsigned\Appx.map"
Reading mapping file "unsigned\Appx.map"
Packing 3 file(s) listed in "unsigned\Appx.map" (mapping file) to "\\?\Signed\my-file-to-sign.zip" (output file name).
Memory limit defaulting to 8529401856 bytes.
Using "unsigned\AppxManifest.xml" as the manifest for the package.
Processing "unsigned\my-file-to-sign.zip" as a payload file. Its path in the package will be "my-file-to-sign.zip".
Processing "unsigned\AppTile.png" as a payload file. Its path in the package will be "AppTile.png".
Package creation succeeded.

Sample Code signing with gpg tool:
Gpg –output doc.sig –sign doc
You need a passphrase to unlock the private key for
User: “Alice (Judge) alice@cyb.org”
1024-bit DSA key, ID BB7576AC, created 1999-06-04
Enter passphrase:

Friday, December 6, 2019

Signing files using private key
Signing is the process by which a digital signature is created from the file contents. The signature proves
that there was no tampering with the contents of the file. The signing itself does not need to encrypt the
file contents to generate the signature. In some cases, a detached signature may be stored as a separate
file. Others may choose to include the digital signature along with the set of files as an archive.
There are other processes as well that can help with checking the integrity of files. These include
hashing of files, generating a checksum and others. Signing differs from those methods in that it uses a
private-public key pair to compute the digital signature. The private key is used to sign a file while the
public key is used to verify the signature. The public key can be published with the signature or it can be
made available in ways that are well-known to the recipients of the signed files.
The process of signing can take any form of encryption methods. The stronger the encryption the better
the signature and lesser the chances that the file could have been tampered. The process of signing
varies across operating system.
Popular linux family hosts often use the ‘gpg’ tool to sign and verify the files. This tool even generates
the key-pair with which to sign the files. The resulting signature is in the Pretty Good Privacy protocol
format and stored as a file with extension .asc. Publishing the public key along with the detached
signature is a common practice for many distributions of code.
Microsoft uses separate tools for making the key-certificate pair and the generation of the signature.
The signer tool used by this company packs the payload into an altogether new file. The signature is part
of the new file and can function just like an archive. The same algorithm and strength can also be used
with the signer tool as it was with the gpg tool.
The use of certificates is popular because they can be shared easily and have no liability. The recipient
uses the certificate to verify the signature.
These are some of the means for signing and its use is widespread across use cases such as code
commits, blockchain and digital rights management.

Thursday, December 5, 2019

We were discussing Flink applications and the use of stream store such as Pravega

it is not appropriate to encapsulate an Flink connector within the http request handler for data ingestion at the store. This API is far more generic than the upstream software used to send the data because the consumer of this REST API could be the user interface, a language specific SDK, or shell scripts that want to make curl requests. It is better for the rest API implementation to directly accept the raw message along with the destination and authorization.
The policy for read operations should be such that they can be independent and scalable. A read only policy suffices for this purpose. The rest can be read-write.
The separation of read-write from read-only also helps with their treatment differently. For example, it is possible to replace the technology for the read-only separately from the technology for read-write. Even the technology for read-only can be swapped from one to another for improvements on this side.
An example of analysis on a read only path is the extraction of exception stack trace from logs:

The utility of Flink applications is unparalleled such as with a stackTrace hasher example to collect the top exceptions encountered from log archives:
private static class ExtractStackTraceHasher implements FlatMapFunction<String, String>{
@Override
public void flatMap(String value, Collector< String> out) throws Exception {
StringTokenizer tokenizer = new StringTokenizer(value);
While (tokenizer.hasMoreTokens()) {
String word = tokenizer.nextToken();
If (word.contains(“Exception:”) {
int start = value.indexOf(word); // word has unique timestamp
int end = value.indexOf(word.substring(8), start+8);
if (end != -1 && start != -1 && end > start) {
String exceptionString = value.substring(start+11, end); // skip the timestamp
Throwable error = Throwable.parse(exceptionString);
var stackHasher = new net.logstash.logback.stacktrace.StackHasher();
out.collect(stackHasher.hexHash(error));
}
}
}
}
}

The results from the above iteration can be combined combined for each iteration of the files on the archive location.
public class TopStackTraceMerger implements ReduceFunction<map<String, int>> {
@Override
public Integer reduce(map<String, int> set1, map<String, int> set2) {
return merge(set1, set2);
}
}

Wednesday, December 4, 2019

We were discussing Flink applications and the use of stream store such as Pravega
it is not appropriate to encapsulate an Flink connector within the http request handler for data ingestion at the store. This API is far more generic than the upstream software used to send the data because the consumer of this REST API could be the user interface, a language specific SDK, or shell scripts that want to make curl requests. It is better for the rest API implementation to directly accept the raw message along with the destination and authorization.
There are two factors I want to discuss further when comparing the analytical applications:
First, the use of Pravega as a stream store should not mandate the use of an Flink Application with Flink Connector. Data can be sent and read directly to and from the Pravega store respectively. However, the use of FlinkApplication is generally for performing transformations and queries which are both helpful when done in a cluster mode so that they can scale to large data sets.
Second data path is critical so it is not necessary to combine the collection and the analysis together.
Although most analysis works well only when certain collection happens, it is not necessary to make the collection heavy in terms of its processing. The collection is a data path and can almost always be streamlined. The analysis can happen as the collection happens because it is stream processing. However, analysis does not have to happen within the collection. It can execute separately and as long as there is a queue of events collected, analysis can begin with stream processing even for varying rates of ingestion and analysis because they are write and read paths respectively that are best kept separate.

Tuesday, December 3, 2019

The utility of Flink applications is unparalleled such as with a stackTrace hasher example to collect the top exceptions encountered from log archives:
private static class ExtractStackTraceHasher implements FlatMapFunction<String, String>{
@Override
public void flatMap(String value, Collector< String> out) throws Exception {
StringTokenizer tokenizer = new StringTokenizer(value);
While (tokenizer.hasMoreTokens()) {
String word = tokenizer.nextToken();
If (word.contains(“Exception:”) {
int start = value.indexOf(word); // word has unique timestamp
int end = value.indexOf(word.substring(8), start+8);
if (end != -1 && start != -1 && end > start) {
String exceptionString = value.substring(start+11, end); // skip the timestamp
Throwable error = Throwable.parse(exceptionString);
var stackHasher = new net.logstash.logback.stacktrace.StackHasher();
out.collect(stackHasher.hexHash(error));
}
}
}
}
}

final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
For (string path: pathsInExtractedLogArchive(archiveLocation)) {
env.readTextFile(path)
.flatMap(new ExtractStackTraceHasher ())
.keyBy(0)
.sum(1)
.print();
}

Monday, December 2, 2019

Patenting Introspection:
A software product may have many innovations. Patents protect innovations from infringement and provide some copyrights assertion for the inventor or software maker so that someone else cannot claim as having ownership of the novelty. Patents can also help protect the maker from losing its competitive edge to someone else copying the idea. Patents are therefore desirable and can be used to secure mechanisms, processes and any novelty that was introduced into the product.
Introspection is the way in which the software maker uses the features that were developed for the consumers of the product for themselves so that they can expand the capabilities to provide even more assistance and usability to the user. In some sense, this is automation of workflows combined with the specific use of the product as a pseudo end-user. This automation is also called ‘dogfooding’ because it relates specifically to utilizing the product for the maker itself. The idea of putting oneself in the customers shoes to improve automation is not new in itself. When the product has many layers internally, a component in one layer may reach a higher layer that is visible to another standalone component in the same layer so that the interaction may occur between otherwise isolated components. This is typical for layered communication. However, the term ‘dogfooding’ is generally applied to the use of features available from the boundary of the product shared with external customers.
Consider a storage product which the customers use to store their data. If the software maker for the storage product decided to use the same product for storing data in isolated containers that are reserved for internal use by the maker, then this becomes a good example of trying out the product just like the customers would. This dogfooding is specifically for the software maker to store internal operational data from the product which may come in useful for troubleshooting and production support later. Since the data is stored locally to the instance deployed by the customer and remains internal, it is convenient for the maker to gather history that holds meaning only with that deployment.
The introspection automation is certainly not restricted to the maker. A customer may choose to do so as well. However, an out-of-box automation for the said purpose, can be done once by the maker for the benefit of each and every customer. This removes some burden from the customer while allowing them to focus more on storing data that is more relevant to their business rather than the operations of the product. A competitor to the product or a third-party business may choose to redo the same introspection with similar automation in order to gain a slice of the revenue. If a patent had been issued for the introspection, it would certainly benefit the product maker in this case.
A patent can only be given when certain conditions are met. We will go over these conditions for their applicability to the purpose of introspection.
First, the patent invention must show an element of novelty which is not already available in the field. This body of existing knowledge is called “prior art”. If the product has not been released, introspection becomes part of v1 and prevents the gap where a competitor can grab a patent between the releases of the product by the maker.
Second, the invention must involve an “inventive step” or “non-obvious” step where a person having ordinary skill in the relevant technical field cannot do the same. This is sufficiently met by introspection because the way internal operational data is stored and read is best known to the maker since all the components of the product are visible to the maker and best known to their staff.
Third, the invention must be capable of industrial application such that it is not merely a theoretical phenomenon but one that is useful in practice. Fortunately, all product support personnel will vouch for the utility of records that can assist with troubleshooting and support of the product in mission critical deployments.
Finally, there is an ambiguous criterion that the innovation must be “patentable” under law for a specific country. In many countries, certain theories, creativity, models or discovery are generally not patentable. Fortunately, when it comes to software development, history and tradition serves well just like in any other industry.
Thus, all the conditions of the application for patent protection of innovation can be met. Further, when the invention is disclosed in a manner that is sufficiently clear and complete to enable it to be replicated by a person with an ordinary level of skill in the relevant technical field, it improves the acceptance and popularity of the product.

Sunday, December 1, 2019

We were discussing Flink APIs.

The source for events may generate and maintain state for the events generated. This generator can event restore state from these snapshots. Restoring a state means simply using the last known timestamp from the snapshot. All events subsequent to the timestamp will then be processed from that timestamp. Each event has a timestamp that can be extracted with Flink’s AscendingTimestampExtractor and the snapshot is merely a timestamp. This allows all events to be processed from the last snapshot.

A source may implement additional behavior such as restricting the maximum number of events per second and periodically sampled so that when the number of events have exceeded in a period, the producer sleeps for a duration of the time remaining between the current time and the expiry of the wait period.

It should be noted that writing steams via connectors is facilitated by the store. However, this is not the only convention to send data to a store. For example, we have well-known protocols like S3 which are widely recognized and equally applicable to stream stores just as much as they are applied to object stores.

By the same argument, data transfer can also occur over any proprietary REST based APIs and not just industry standard S3 Apis. Simple http requests to post data to store is another way to allow applications to send data. This is also the method for popular technology stacks such as influxDB, telegraf, chronograf to collect and transmit metrics data. Whether there are dedicated agents involved in relaying the data or the store itself accumulates the data directly over the wire, these are options to widen the audience for the store.

Making it easy for audience who don’t have to code to send data is beneficial not only to the store but also to these folks who support and maintain production level data stores because it gives them an easy way to do a dry run rather than have to go through development cycles. The popularity of the store is also increased by the customer base

Technically, it is not appropriate to encapsulate an Flink connector within the http request handler for data ingestion at the store. This API is far more generic than the upstream software used to send the data because the consumer of this REST API could be the user interface, a language specific SDK, or shell scripts that want to make curl requests. It is better for the rest API implementation to directly accept the raw message along with the destination and authorization.

Implementation iupcoming n Pravega fork tree at https://github.com/ravibeta/pravega