Cluster computing

Friday, March 29, 2019

Today we continue discussing the best practice from storage engineering:

639) If there are multiple registrations that need to be kept in sync, they get harder to maintain. It is easier if the lists can be combined or there is a one to one mapping between the lists

640) Failed tasks may require new tasks to be added in which case, it is better to find the failed tasks as separate from the otherwise new tasks.

641) When the tasks are constantly replenished, it is helpful to keep track of in versus out.

642) The tasks that are out are candidates for cleanup.

643) The tasks that are in are either existing or new. They are mutually exclusive so it is easy to tell the new ones from the old.

644) The tasks that are new will need things setup for them to execute. It involves initialization so that they can be included in the list

645) The tasks that run long need to indicate progress in some way so that the scheduler knows that this task is still active and not stuck.

646) When the tasks have to sort the results, the sorting order might change as the listing changes. It is helpful to refresh the listing before sorting.

647) If the listing is large, it is not easy to refresh without taking a cost on the overall query time. In such cases, it helps to have progressive listing. Where the changes are made to one one end of the listing while the other ends remains as is. As the listings are added to the tail, the stats from unchanged can be reused for the new entries.

Thursday, March 28, 2019

Today we continue discussing the best practice from storage engineering :

633) The state of an object is authoritative. If it weren’t the source of truth, the entries itself cannot be relied on without involving validation logic across entries. There is no probllem performing validations but doing them over and over again not only introduces delays but can be avoided altogether with clean state.

634) The states are also representative and unique. The entries are not supposed to be in two or more states at once. It is true that bitmask can be used to denote conjunctive status but a forward only discrete singular state is preferable.

635) The attributes in an entry are often added on a case by case basis since it is expedient to add a new attribute without affecting others. However, the accessors of the entry should not proliferate the attributes. If the normalization of the attribute can serve more than one accessor, it will provide consistency across accesses.

636) Background tasks may be run or canceled. Frequently these tasks need to be canceled. If they don’t do proper cleanup, they can leave their results in bad state. The shutdown helps release the resources properly

637) The list of background tasks may need to include and exclude the tasks as they appear or disappear. This is in addition to start and stop on each task. If the start and registration are combined, the stop and deregistration must also be combined.

638) As tasks appear and disappear, it is sometimes too tedious to perform all the chores for each task. In such cases, we merely difference the new tasks and add them to the list. This prevents the cleanup on each job as they are left. A large-scale global shutdown may suffice later.

639) If there are multiple registrations that need to be kept in sync, they get harder to maintain. It is easier if the lists can be combined or there is a one to one mapping between the lists

640) Failed tasks may require new tasks to be added in which case, it is better to find the failed tasks as separate from the otherwise new tasks.

Wednesday, March 27, 2019

Today we continue discussing the best practice from storage engineering:

631) Listing entry values are particularly interesting. In addition to the type of attributes in an entry, we can take advantage of the range of values that these attributes can take. For example, we can reserve boundary values and extremely tiny values that will not be encountered in the real world at least for the majority of cases.

632) When the values describe the size of an associated object, the size itself can be arbitrary and it is not always possible to rule out a size for a user object no matter how unlikely it seems. However, when used together with other attributes such as status, they become usable as representative of some object state that is otherwise not easily ffound.

633) The state of an object is authoritative. If it weren’t the source of truth, the entries itself cannot be relied on without involving validation logic across entries. There is no probllem performing validations but doing them over and over again not only introduces delays but can be avoided altogether with clean state.

634) The states are also representative are also unique. The entries are not supposed to be in two or more states at once. It is true that bitmask can be used to denote conjunctive status but a forward only discrete singular state is preferable.

635) The attributes in an entry are often added on a case by case basis since it is expedient to add a new attribute without affecting others. However, the accessors of the entry should not proliferate the attributes. If the normalization of the attribute can serve more than one accessor, it will provide consistency across accesses.

636) Background tasks may be run or canceled. Frequently these tasks need to be canceled. If they don’t do proper cleanup, they can leave their results in bad state. The shutdown helps release the resources properly

Tuesday, March 26, 2019

Today we continue discussing the best practice from storage engineering:

626) Listings are great to use when they are in a single location. However, they are often scoped to a parent container. If the parent containers are distributed, the listings tend to be multiple. In such cases the effort is repeated.

627) When the listings are separated by locations, the results from the search may be fewer than the expected total if only one of the locations is searched. This has often been encountered in deployments.

628) The listings do not need to be aggregated across locations in all cases. Sometimes, only the location is relevant and the listing and the search can be scoped to it.

629) Iterating the listings has proved banal in most cases both for system and for user. Consequently, either an identifier is used to go directly to the entry in the listing or a listing is reserved so that only that listing is accessed.

630) The listing can be cleaned up as well. There is no need to keep it growing with outdated entries and then archived by age. The cleaning can happen in the background so that list iterations skip over entries or do not see the entries that appear as removed.

631) Listing entry values are particularly interesting. In addition to the type of attributes in an entry, we can take advantage of the range of values that these attributes can take. For example, we can reserve boundary values and extremely tiny values that will not be encountered in the real world at least for the majority of cases.

Monday, March 25, 2019

Today we continue discussing the best practice from storage engineering:

620) When a new key is added, it may not impact existing keys but it does affect the overall space consumption of the listing depending on the size and number.

621) The keys can have as many fields as necessary. However, the lookups are faster when there are only a few keys to compare.

622) Key comparison can be partial or full. Partial keys are useful to match duplicates. The number of keys that share the same subkeys can be many. This form of comparison is very helpful to group entries.

623) Grouping of entries also help with entries that span groups based on sub keys. These work across groups

624) The number of entries may run to a large order but the prefix could be more inclusive of subkeys to narrow the search. This makes it efficient to run on these listings.

625) The number of entries also don’t matter to the number of keys in each entry as long as the prefix is using a small set of subkeys.

626) Listings are great to use when they are in a single location. However, they are often scoped to a parent container. If the parent containers are distributed, the listings tend to be multiple. In such cases the effort is repeated.

#codingexercise
Find paths in a matrix
int GetPaths(int x, int y)
{
if (x <= 0 || y <= 0)
return 1;

return GetPaths(x - 1, y) +
GetPaths(x - 1, y - 1) +
GetPaths (x, y - 1); // for the three possible directions
}

Saturday, March 23, 2019

Today we continue discussing the best practice from storage engineering:

611) Background tasks may sometimes need to catch up with the current activities. In order to accommodate the delay, they may either be run upfront so that changes to be processed are incremental or they can increase in number to divide up the work.

612) The results from the background tasks mentioned above might also take a long time to accumulate. They can be made available as they appear or batched.

613) The load balancer works very well to enable background tasks to catch up by not overloading a single task and distributing the online activities to ensure that the background task has light load

614) The number of background tasks or their type should not affect online activities. However, systems have known to be impacted when the tasks are consuming memory or delay garbage collection

615) There is no specific mitigation for one or more background tasks that takes plenty of shared resources but generally they are written to be fault tolerant so that they can pick up from where they left off.

Friday, March 22, 2019

Today we continue discussing the best practice from storage engineering :

606) We use data structures to keep the information we want to access in a convenient form. When this is persisted, it mitigates faults in the processing. However each such artifact brings in additional chores and maintenance. On the other hand, it is cheaper to execute the logic and the logic can be versioned. Therefore when there is a trade-off between compute and storage for numerous small and cheap artifacts, it is better to generate them dynamically

607) The above has far reaching impact when there are a number of layers involved and ac cost incurred in the lower layer bubbles up to the top layer.

608) Compute tends to be distributed in nature while storage tends to be local. They can be mutually exclusive in this regard.

609) Compute oriented processing can scale up or out while storage has to scale out.

610) Compute oriented processing can get priority but storage tends to remain in a class

611) Background tasks may sometimes need to catch up with the current activities. In order to accommodate the delay, they may either be run upfront so that changes to be processed are incremental or they can increase in number to divide up the work.

612) The results from the background tasks mentioned above might also take a long time to accumulate. They can be made available as they appear or batched.

613) The load balancer works very well to enable background tasks to catch up by not overloading a single task and distributing the online activities to ensure that the background task has light load