Cluster computing

Wednesday, April 3, 2019

Today we continue discussing the best practice from storage engineering:

667) The type of features from the operator sdk used by the product depend on the automation within the container-specific operator specified by the product. If the product wants to restrict its usage of the container, it can take advantage of just the minimum.

668) One of these features is metering and it works the same way as in the public cloud. Containers are still catching up with the public cloud on this feature.

669) The operators can be written in a variety of languages depending on the sdk however in many cases a barebone application without heavy interpreters or compilers is preferred. Go language is used for these purposes and particularly in devOps.

670) There is no special requirement for performance or security from the containerization framework than what the application wants from the host because this is internal and not visible to the user.

671) Operators like update involve scaling down the nodes, then performing the update and then scaling back up. There is no restriction to reuse logic between operators.

672) Operators generally work one at a time on the same cluster. This prevents states from being mixed and allowing each reconcile between state and deploy to happen sequentially.

673) Operators do not have retain any information between invocations. Anything that needs to be persisted has to be part of state.

674) There is no limit to the number of operators run during deploy. It is preferred to run them sequentially one after the other. The more granular the operators the better they are for maintenance.

675) The diagnosability of operators improves with each operator being small and transparent.

Tuesday, April 2, 2019

Today we continue discussing the best practice from storage engineering:

661) The operator used to build and deploy the storage server can take care of all the human oriented administrative tasks such as upgrade, scaling, and backups.

662) The logic of these administrative tasks remains the same across versions, size and nature.

663) The parameters for the task are best described in the declarative syntax associated with the so-called custom resource required for these deployment operators

664) The number of tasks or their type depend from application to application and can became sophisticated and customized. There is no restriction to this kind of automation

665) The containerized image built has to be registered in the hub so that it is made available everywhere.

666) There are many features on the container framework that the storage product can leverage. Some of these features are available via the SDK. However, container technologies continue to evolve in terms of following the Platform as a service layer and the public cloud example.

667) The type of features from the operator sdk used by the product depend on the automation within the container-specific operator specified by the product. If the product wants to restrict its usage of the container, it can take advantage of just the minimum.

668) One of these features is metering and it works the same way as in the public cloud. Containers are still catching up with the public cloud on this feature.

669) The operators can be written in a variety of languages depending on the sdk however in many cases a barebone application without heavy interpreters or compilers is preferred. Go language is used for these purposes and particularly in devOps.

670) There is no special requirement for performance or security from the containerization framework than what the application wants from the host because this is internal and not visible to the user.

Monday, April 1, 2019

Today we continue discussing the best practice from storage engineering:

656) Different containerization technologies have proprietary format and semantics for the storage application to use. Of these the most critical aspect of the storage server has been networking because the nodes often scale out and the nodes need to be declared to the containers for proper connectivity. NAS storage is not going away as long as hard drives continue to be nearline storage.

657) These same containerization technologies allow the storage services to be deployed as images. This is very helpful to run the same image across a variety of hosts without having to recompile and build the package.

658) Such an image needs to be built with the help of declared dependencies for the storage service often called custom resources in the containerization parlance and a logic to reconcile the state of the deployment with the intended declarations. This process is often aided by the sdk from the containerization technology.

659) The sdk from containers can also help with the scaffolding code generation for use with the storage service to build and deploy the service as an image. This code is called an operator

660) The operator used to build and deploy the storage server as an image for the container is usually one per product. The flavors of the product and its releases can be detailed in the definitions for the same operator.

Sunday, March 31, 2019

Today we continue discussing the best practice from storage engineering:

649) If the entries vary widely affecting the overall results at a high rate, it is easier to take on the changes on the compute side but allow the storage of the listings to be progressive. This way tasks can communicate the changes to their respective sortings to the scheduler which can then adjust the overall sort order

650) If the listing is a stream, processing on a stream works the same was as cursor on a database adjusting the rankings gathered so far for each and every entry as they are encountered.

651) The processing of stream is facilitated with compute packages from Microsoft and Apache for example. These kind of packages highlight the stream processing techniques that can be applied to stream from a variety of storage.

652) About the query and the algorithm be it mining or machine learning can be externalized. This can work effectively across storage just as much as it is applicable to specific data.

653) The algorithms vary widely in their duration and convergence even for the same data. There is usually no specific rule to follow when comparing algorithms in the same category

654) The usual technique in the above case is to use a strategy pattern that interchanges algorithms and evaluated them on trial and error basis.

655) Storage services can take advantage of containerization just like other applications. Provisioning the service over the containers while allowing the nodes to remain part of the cluster is both scale-able and portable.

Saturday, March 30, 2019

Today we continue discussing the best practice from storage engineering

648) If the listing is distributed, it helps to have a map-reduce on the listing

649) If the entries vary widely affecting the overall results at a high rate, it is easier to take on the changes on the compute side but allow the storage of the listings to be progressive. This way tasks can communicate the changes to their respective sortings to the scheduler which can then adjust the overall sort order

650) If the listing is a stream, processing on a stream works the same was as cursor on a database adjusting the rankings gathered so far for each and every entry as they are encountered.

651) The processing of stream is facilitated with compute packages from Microsoft and Apache for example. These kind of packages highlight the stream processing techniques that can be applied to stream from a variety of storage.

652) About the query and the algorithm be it mining or machine learning can be externalized. This can work effectively across storage just as much as it is applicable to specific data.

653) The algorithms vary widely in their duration and convergence even for the same data. There is usually no specific rule to follow when comparing algorithms in the same category

654) The usual technique in the above case is to use a strategy pattern that interchanges algorithms and evaluated them on trial and error basis.

Friday, March 29, 2019

Today we continue discussing the best practice from storage engineering:

639) If there are multiple registrations that need to be kept in sync, they get harder to maintain. It is easier if the lists can be combined or there is a one to one mapping between the lists

640) Failed tasks may require new tasks to be added in which case, it is better to find the failed tasks as separate from the otherwise new tasks.

641) When the tasks are constantly replenished, it is helpful to keep track of in versus out.

642) The tasks that are out are candidates for cleanup.

643) The tasks that are in are either existing or new. They are mutually exclusive so it is easy to tell the new ones from the old.

644) The tasks that are new will need things setup for them to execute. It involves initialization so that they can be included in the list

645) The tasks that run long need to indicate progress in some way so that the scheduler knows that this task is still active and not stuck.

646) When the tasks have to sort the results, the sorting order might change as the listing changes. It is helpful to refresh the listing before sorting.

647) If the listing is large, it is not easy to refresh without taking a cost on the overall query time. In such cases, it helps to have progressive listing. Where the changes are made to one one end of the listing while the other ends remains as is. As the listings are added to the tail, the stats from unchanged can be reused for the new entries.

Thursday, March 28, 2019

Today we continue discussing the best practice from storage engineering :

633) The state of an object is authoritative. If it weren’t the source of truth, the entries itself cannot be relied on without involving validation logic across entries. There is no probllem performing validations but doing them over and over again not only introduces delays but can be avoided altogether with clean state.

634) The states are also representative and unique. The entries are not supposed to be in two or more states at once. It is true that bitmask can be used to denote conjunctive status but a forward only discrete singular state is preferable.

635) The attributes in an entry are often added on a case by case basis since it is expedient to add a new attribute without affecting others. However, the accessors of the entry should not proliferate the attributes. If the normalization of the attribute can serve more than one accessor, it will provide consistency across accesses.

636) Background tasks may be run or canceled. Frequently these tasks need to be canceled. If they don’t do proper cleanup, they can leave their results in bad state. The shutdown helps release the resources properly

637) The list of background tasks may need to include and exclude the tasks as they appear or disappear. This is in addition to start and stop on each task. If the start and registration are combined, the stop and deregistration must also be combined.

638) As tasks appear and disappear, it is sometimes too tedious to perform all the chores for each task. In such cases, we merely difference the new tasks and add them to the list. This prevents the cleanup on each job as they are left. A large-scale global shutdown may suffice later.

639) If there are multiple registrations that need to be kept in sync, they get harder to maintain. It is easier if the lists can be combined or there is a one to one mapping between the lists

640) Failed tasks may require new tasks to be added in which case, it is better to find the failed tasks as separate from the otherwise new tasks.