Cluster computing

Tuesday, October 20, 2020

Network engineering continued ...

This is a continuation of the earlier posts starting with this one: http://ravinote.blogspot.com/2020/09/best-practice-from-networking.html

While container platforms for Platform-as-a-service (PaaS) have enabled software to be deployed without any recognition of the host and frequently rotated from one host to another, the end user's adoption of the PaaS platform depends on the production readiness of the applications and services. The force for PaaS adoption has made little or no changes to the use and proliferation of virtual machines by individual users

The cloud services provider can package services such as additional storage, regular backup schedule, patching schedule, system management, securing, and billing at the time of request for each asset. However, such services depend on the cloud where the services are requested. For private cloud, a lot of the service is in-house adding to the costs even if the inventory is free.

The use of a virtual machine image as a storage artifact only highlights the use of large files in storage and networking. They are usually saved on the datastore in the datacenter but nothing prevents the end-user from owning the machine, taking periodic backups of the VM image and uploading with tools like duplicity. These files can then be stashed in storage products like object storage. The ability of S3 to take on multi-part upload eases the proliferation of large files.

The use of large files helps test most bookkeeping associated with the logic that depends on the size of the artifact. While performance optimizations remove redundant operations in different layers to streamline a use case, the unoptimized code path is better tested with large files.

Monday, October 19, 2020

Network engineering continued

This is a continuation of the earlier posts starting with this one: http://ravinote.blogspot.com/2020/09/best-practice-from-networking.html

Sometimes it is helpful to phase out decisions to multiple tiers. For example, with admission control, the tier that handles the connections and dispatches processes may choose to keep the number of client connections below a threshold. At the same time, the inner system layer might determine whether the execution is postponed, begins execution with fewer resources, or begins execution without restraints.

The decision on the resources can come from the cost involved in the query plan. These costs might include the disk devices that the query will access, the number of random and sequential I/Os per device, the estimates of the CPU load of the query, the number of key-values to process, and the amount of memory foot-print of the query data structures.

With a shared-nothing architecture, there is no sharing at the hardware resource level. In such cases, multiple instances of the storage product may be installed or a cluster mode deployment may be involved. Each system in the cluster stores only a portion of the data and requests are sent to other members for their data. This facilitates the horizontal partitioning of data.

When the data is partitioned with different collections rather than the same collection but different ranges over participating nodes, it is referred to as vertical partitioning. There are some use cases for this where data may have groups and a group might not require partitioning.

A shared-nothing system must mitigate partial failures. This is a term used to describe the condition when one or more of the participating nodes goes down. In such cases the mitigation may be one of the following: 1) bring down all of the nodes when anyone fails which is equivalent to a shared –memory system, 2) use “data skipping” where queries are allowed to be executed on any node that is up and the data on the failed node is skipped and 3) use as much redundancy as necessary to allow queries access to all the data regardless of any unavailability.

#codingexercise

public static void rotate(int[] a, int k) {

for (int I = 0; I < k; I++) {

shiftright(a);

}

public static void shiftright(int[] a) {

if (a == null || a.length() == 0) return;

int temp = a[a.length() -1];

for (int i = a.length() - 2; I >= 0; I--) {

A[I+1] = a[I];

}

a[0] = temp;

}

Sunday, October 18, 2020

Network engineering continued ...

This is a continuation of the earlier posts starting with this one: http://ravinote.blogspot.com/2020/09/best-practice-from-networking.html

1. Clusters deal with nodes and disks as a commodity making no differentiation in terms of capacity improved or nodes added. They are tolerant of nodes going down and view the disk array as Network Access Storage. If they could improve resource management with storage classes where groups of disks are treated differently based on power management and I/O scheduling, it will provide tremendous quality of service levels to workloads.
  While there can be coordination between the controller nodes and data nodes in a cluster, an individual disk or a group of disks in a node does not have a dedicated disk worker to schedule I/O to the disks since storage has always been progressing towards higher and higher disk capacity. When the disks become cheap in their expansion by way of numerous additions and earmarking, then the dispatcher and execution worker model can even be re-evaluated.
  The process per disk worker model is still in use today. It was used by early DBMS implementations. The I/O scheduling manages the time-sharing of the disk workers and the operating system offers protection. This model has been helpful to debuggers and memory checkers.
  The process pool per disk worker model has alleviated the need to fork processes and tear down and every process in the pool is capable of executing any of the read-writes from any of the clients. The process pool size is generally finite if not fixed. This has all of the advantages from the process per disk worker model above and with the possibility of differentiated processes in the pool and their quota.
  When compute and storage are consolidated, they have to be treated as commodities and the scalability is achieved only with the help of scale-out. On the other hand, they are inherently different. Therefore, nodes dedicated to computation may be separated from nodes dedicated to storage. This lets them both scale and load balance independently.
2. #codingexercise: https://ideone.com/DBjnkH

Saturday, October 17, 2020

This is a continuation of the earlier posts starting with this one: http://ravinote.blogspot.com/2020/09/best-practice-from-networking.html

The consensus protocol used in distributed components of a storage system, is required to be fault-tolerant. However, the choice of consensus protocol may vary from system to system

Message passing between agents in a distributed environment is required. Any kind of protocol can suffice for this. Some systems like to use open source in this regard while others build on message queuing.

Every networking server prepares for fault tolerance. Since faults can occur in any domain, temporarily or permanently, each component determines which activities to perform and how to overcome what is not available.

Fault domains are a group covering known faults in an isolation. Yet some faults may occur in combinations. It is best to give names to patterns of faults so that they can be included in the design of components.

Data driven computing has required changes in networking products. While previously, online transactional activities were read-write intensive and synchronous, today most processors including order and payments are done asynchronously on data driven frameworks usually employing a message queueing. Networking products do better with improved caching for these processing

#codingexercise: https://ideone.com/DBjnkH

Friday, October 16, 2020

Network engineering continued ...

This is a continuation of the earlier posts starting with this one: http://ravinote.blogspot.com/2020/09/best-practice-from-networking.html

1. P2P can be structured or unstructured.
  Messages are helpful to enforce consistency as nodes come up or go down. For example, a gossip protocol may be used for this purpose and it involves propagating updates via message exchanges.
  Message exchanges can include state or operation transfers. Both involve the use of vector clocks.
2. A utility token issued for authorization need not be opaque. It can be made up of three parts – one which represents the user, another which represents the provider, and an irrefutable part that is like a stamp of authority. The section from the provider may also include scope for the resources authorized.
  
  The attestation mechanism itself might vary. This might include, for example, Merkle tree where each node of the tree represents an element of Personally-Identifiable-Information (PII) along with the hash and the hash of the hashes of the child nodes. The root hash then becomes the fingerprint of the data being attested.
  
  An immutable record which has its integrity checked and agreed upon on an ongoing basis provides a venerable source of truth.
  
  A rewarding service increases appeal and usage by customers. This is what makes Blockchain popular.
  Many clusters are used as a failover cluster and not as a performance or scalable cluster. This is primarily because a server is designed for scale-up versus scale-out. This is an emphasis on the judicious choice of technology for design.
  Some servers use shared storage and do not go offline. Most products have embraced Network Access Storage. Similar considerations are true when a database server is hosted on a container and the database is on a shared volume
  The clustering does not save space or efforts for backup or maintenance. And it does not scale out the reads for the database. Moreover, it does not give a 100% uptime for a database.
  #codingexercise: https://ideone.com/DBjnkH

Thursday, October 15, 2020

Network engineering continued ...

This is a continuation of the earlier posts starting with this one: http://ravinote.blogspot.com/2020/09/best-practice-from-networking.html

1. P2P can be structured or unstructured.
  Messages are helpful to enforce consistency as nodes come up or go down. For example, a gossip protocol may be used for this purpose and it involves propagating updates via message exchanges.
  Message exchanges can include state or operation transfers. Both involve the use of vector clocks.
  
  In the case of the state transfer model, each replica maintains a state version tree that contains all the conflicting updates. When the client sends its vector clock, the replicas will check whether the client state precedes any of its current versions and discard it accordingly. When it receives updates from other replicas via gossip, it will merge the version trees.
  
  In the case of the operation transfer model, each replica has to first apply all operations corresponding to the cause before those corresponding to the effect. This is necessary to keep the operations in the same sequence on all replicas and is achieved by adding another entry in the vector clock, a V-state, that represents the time of the last updated state. In order that this causal order is maintained, each replica will buffer the update operation until it can be applied to the local state A tuple of two timestamps - one from the client's view and another from the replica's local view is associated with every submitted operation.
  
  Since operations are in different stages of processing on different replicas, a replica will not discard the state or operations it has completed until it sees the vector clocks from all others to have preceded it.