Cluster computing

Friday, November 6, 2020

Network engineering continued

This is a continuation of the earlier posts starting with this one: http://ravinote.blogspot.com/2020/09/best-practice-from-networking.html

Messages may contain just the hash of the data and the challenge while the responses may contain just the proof. This enables the message exchange to be meaningful while keeping the size within a limit.

Contracts are signed and stored by both parties. Contracts are just as valuable as the sensitive data. Without a contract in place, there is no exchange possible.

Parties interested in forming a contract use a publisher-subscriber model that is topic-based and uses a bloom filter implementation which tests whether an element is a member of a set.

Emerging trends like Blockchain have no precedence for storage standards. A blockchain is a continuously growing list of records called blocks which are linked and secured using cryptography. Since it is resistant to tampering, it becomes an open distributed ledger to record transactions between two parties.

Blockchain is used for a variety of purposes. Civic for instance enables its users to login with their fingerprints. It uses a variety of smart contracts, an indigenous utility token and new software applications. A Merkel tree is used for attestation.

#codingexercise:

Given an array of non-negative integers, you are initially positioned at the first index of the array.

Each element in the array represents your maximum jump length at that position.

Determine if you can reach the last index.

public static boolean isReachable(int[] A) {

Int n = A.length;

int dp[] = new int[n];

dp[0] = 1;

dp[1] = 1;

for (int m = 2; m < n; m++) {

dp[m] = 0;

for (int j = 1; j <= A[m] && j <= i; j++) {

dp[m] += dp[m-j] ;

}

return dp[n-1] != 0;

}

Thursday, November 5, 2020

Network engineering continued ...

This is a continuation of the earlier posts starting with this one: http://ravinote.blogspot.com/2020/09/best-practice-from-networking.html

Distributed Hash Table has gained widespread popularity in distributing the load over a network. There are some well-known players in this technology. Kademlia for instance proves many of the theorems for DHT.

Messages are routed through low latency paths and use parallel asynchronous queries. Message queuing itself and its protocol is an excellent communication mechanism for a distributed network.

The integrity of the data is verified with the help of Merkle trees and proofs in such a distributed framework. Others use key-based encryptions.

Partial audits reduce overhead in compute and storage. For example, shards of data may be hashed many times to generate pre-leaves in a Merkle tree. And the proof may be compact but the tree may not be compact. Partial audits, therefore, save on computing and storage.

Data does not always need to be presented in entirety and always from the source of truth. The data may remain completely under the control of the owner while a challenge-proof may suffice

Wednesday, November 4, 2020

Network engineering continued

This is a continuation of the earlier posts starting with this one: http://ravinote.blogspot.com/2020/09/best-practice-from-networking.html

The total cost of ownership encompasses cost for operations and is usually not reflected on new instances of the networking services. It is used with products that have been used for a while and are becoming a liability.

A higher layer that manages and understands abstractions, provides namespace and data operation ordering and protocol handling is present in many networking services.

A lower layer comprising of distribution and replication of data without actually requiring any knowledge of abstractions or protocols maintained by the higher layer is similarly present in many networking products.

Similarly, the combination of the layers described above is almost always separated from the front-end layer interpreting and servicing the user requests.

Working with streams is slightly different from fixed sized data. It is an ordered set of references to segments. All the extents are generally immutable.

If a shared resource can be represented as a pie, there are two ways to enhance the usage: First, make the pie bigger and the allocate the same fractions in the increment. Second, dynamically modify the fractions so that at least some of the resource usages can be guaranteed some resource.

Tuesday, November 3, 2020

Network engineering continued

This is a continuation of the earlier posts starting with this one: http://ravinote.blogspot.com/2020/09/best-practice-from-networking.html

Strong consistency is an aspect of data not the operations. A simple copy-on-write mechanism and versions is sufficient to enable all accesses to be seen by all parallel processes in their sequential order.

Multi-Tenancy is a choice for the workload not for the infrastructure. If the networking server requires multiple instances for their own product, then it is dividing the resources versus making most of the resources with shared tenancy. Unless there is a significant boost to performance to a particular workload, the cost does not justify workloads to require their own instances of storage products.

Along with tenancy, namespaces can also be local or global. Global namespaces tend to be longer and less user-friendly. On the other hand, global namespaces can enforce consistency

Cost of storage is sometimes vague because it does not necessarily encompass all operational costs for all the units because the scope and purpose changes for the storage product. The cost is not a standard but we can get comparable values when we take the sum of the costs and divide it for unit price.

Cost is always a scalar value and usually calculated by fixing parameters of the system. Different studies may use the same parameters but have widely different results. Therefore, it is not good practice to compare studies unless they are all relatively performing the same.

Monday, November 2, 2020

Network engineering continued ...

This is a continuation of the earlier posts starting with this one: http://ravinote.blogspot.com/2020/09/best-practice-from-networking.html

2. A garbage collector will find it easier to collect aged data by levels. If there are generations in the pages to be reclaimed, it splits the work for the garbage collector so that the application operations can continue to work well.
2. Similarly, aggregated and large pages are easier for the garbage collector to collect rather than multiple spatially and temporally spread out pages If the pages can be bundled or allocated in clusters, it will signal the garbage collector to free them up at once when they are marked
2. Among the customizations for garbage collector, it is helpful to see which garbage collector is being worked the most. The garbage collector closer to the application has far more leverage in the allocations and deallocations than something downstream.
2. The SSD could be treated as a pool of fast storage that is common to all the processes. since it is pluggable and external from all hard drives, it can be dynamically used as long as there is any availability.
2. In this sense it is very similar to L3 cache, however it is not meant for dynamic partitions, balancing access speed, power consumption and storage capacity. It is not as fast as cache but it is more flexible than conventional storage and plays a vital role in managing inter-process communication. This is a simplified storage.
2. SSDs can make use of different storage including flash storage. The two most common are NOR and NAND. NOR was the first of the two to be developed. It is very fast for reads but not as fast for writes, so it is used most often in places where code will be written once and read a lot. NAND is faster for writes and takes up significantly less space than NOR, which also makes it less expensive. Most flash used in SSDs is the NAND variety.

Sunday, November 1, 2020

The outline of an automated monetization of recognition rewards

Problem Statement: Employees in an organization work from home and rely on applications like Slack to remain connected to the workplace. They use emojis to recognize each other’s hard work. Employers would like to leverage this peer recognition to be automatically translated to monetary rewards for the recipients. This should not require any extra action or change in the habit of the recipient in their daily routine.

Solution: Badges and emojis are staple forms of communication language between team members of an organization. They are immediate, informal, and peer-reviewed forms of recognition. Up until now, any mechanism of translating peer recognition to rewards used to involve a laborious process disrupting both the sender and the receiver's routine which tends to elevate the barrier to rewards and their receipt. The following mechanism introduces inline automation to translate the recognition events to reward points which are eventually cut out to gift cards and sent to the recipients.

This automation includes a sender and a receiver side. The sender introduces a bot user to all the monitored Slack channels. This bot user subscribes to an event loop using an Event API available from Slack and follows the event-driven sequence. A user posts an emoji for recognition of another user and an event of type ‘reaction_added’ along with attributes such as ‘item_user’ who is the recipient of the recognition. The bot user receives the event and responds within three seconds. The event loop guarantees to post the event with grace, security, respect, and retry. The bot user handles each event notification with a light-weight technical implementation that treats each event to be of equal value in terms of reward points. Each event notification is translated to a reward point creation request in a reward point accumulation service that maintains a ledger of owner and reward points.

The automation on the receiver side targets the redeeming of the reward points accumulated by a user. Each distinct recipient receives an aggregation of the reward points accumulated periodically. When the reward points exceed a threshold, an eGift card from a major online retailer is cut out using the eGifter API. The code generated for the eGift card from the redeemed reward points is mailed out to the email address retrieved from Slack for that recipient. Since the codes are dynamic and the emails are delivered to the recipient’s email inbox, the sender and receiver take no additional actions for getting reward points.

Since the event metadata gives enough visibility on the policies associated with qualifying an event to the reward points creation such as maximum possible rewards from one user to another in a day, the reward points generation can be controlled and free from malpractice.

The implementation for this monetization of an automated peer recognition system is outlined here: https://github.com/ravibeta/RewardPoints

Further reading on scaling up these services to arbitrary usage and size of an organization is included here.