Cluster computing

Friday, November 27, 2020

Network engineering continued ...

This is a continuation of the earlier posts starting with this one: http://ravinote.blogspot.com/2020/09/best-practice-from-networking.html

Sales and support need to be armed with the latest information and remain current on all features from the customers. They need to have those features work exactly as they say it would.

The focus on business value does not remain confined to the people on the border with the customers. It comes from deep within product development and engineering.

A network has high reusability and can be saved for all subsequent usages. This works well for data transfers that do not change over time.

When the data changes frequently, it helps to organize it in a way such that those that don’t change are on one side and those that do are on the other side. This helps with incremental results from the data.

Data will inevitably have patterns with reuse. We can call them sequences. While most data might be stored with general-purpose B-Tree, the sequences call for more efficient data structures such as the radix tree. These help insert and lookup of sequences to be easier.

Sequences are more efficiently stored if they remain sorted. This canonicalizes them. It also makes lookup use binary search.

The number of sequences might become very large. In such a case, it might be better to not make it part of the same tree and use more than one tree with shards and an index.

Thursday, November 26, 2020

Network engineering continued ...

This is a continuation of the earlier posts starting with this one: http://ravinote.blogspot.com/2020/09/best-practice-from-networking.html

1. The statistics may be stored in a container belonging to the system. Since the system containers are treated differently from the user, they will need to be serviced separately.
1. Application and system both publish statistics. They may both need to be the source of truth for their data
1. When the same container is replicated in different zones, there is a notion of local and remote. Only one of them is designated as primary. The remote is usually secondary
1. The number of applications using the same network is usually not a concern. The ability to serve them with different quality of service makes it special.
1. When an application wants to change the workload on the network, architects to prefer to swap a networking product with something more suitable. However, a performance engineer can circumvent the approach with optimizations that leverage the existing product. It is always a good practice to give this a try.
1. System architecture holds in favor of changing business needs from the smallest components to the overall topology. However, it is rather centralized and sometimes using another instance of the topology with customizations can mitigate the urgency while giving ample time for consolidation.
1. The use of networking products also depends on the developer community. Many products such as software-defined stacks and container orchestration framework have generated greater acceptance by endearing the product to developers.

Wednesday, November 25, 2020

Network engineering continued ...

This is a continuation of the earlier posts starting with this one: http://ravinote.blogspot.com/2020/09/best-practice-from-networking.html

Book-keeping is essential for both scheduler and allocator not only to keep track of grants but also for analysis and diagnostics.

A scheduler and allocator can each have their own manager that separates the concerns of management from their work

The more general purpose the scheduler and allocator become, the easier it is to use them in different components. Commodity implementations win hands down against specialized ones because they scale.

The requests for remote resources are expected to perform longer than local operations. If they incur timeouts, the quantum grants may need to stretch over.

Timeout must expand to include timeouts from nested operations.

Some event notification schemes are helpful to handle them at the appropriate scope.

A recovery state machine can help with global event handling for outages and recovery.

The number of steps taken to recover from outages can be reduced by dropping scoped containers in favor of standby

Tuesday, November 24, 2020

Network Engineering Continued ...

This is a continuation of the earlier posts starting with this one: http://ravinote.blogspot.com/2020/09/best-practice-from-networking.html

The choice between a faster processor or a large storage or both is a flexible choice if the dollar value is the same. In such cases, the strategy can be sequential, streaming or batched. Once the strategy is in place, the dollar TCO significantly increases when business needs change.

From supercomputers to large scale clusters, the size of compute, storage and network can be made to vary quite a bit. The need to own or manage such capability reduces significantly once it is commoditized and outsourced.

Some tasks are high priority and are usually smaller in number than the general class of tasks. If they arrive out of control, it can be significant cost. Most networking products try to control the upstream workload for which they are designed. For example, if the tasks can be contrasted significantly, it can be advantageous.

The scheduling policies for tasks can vary from scheduler to scheduler. Usually, a simple policy scales much better than complicated policies. For example, if all the tasks have a share in a pie representing the scheduler, then it is simpler to expand the pie rather than re-adjusting the pie slices dynamically to accommodate the tasks.

The weights associated with tasks are set statically and then used in computations to determine the scheduling of the tasks. This can be measured in quantums of time and if a task takes more than what is expected, it is called a quantum thief. A scheduler uses tallying to find and make a quantum thief yield to other tasks.

Subscribe to: Comments (Atom)