Tuesday, January 15, 2019

Today we continue discussing the best practice from storage engineering:

318) Ingestion engines have a part to play in the larger data processing pipeline that users search with the help of a search engine. The data storage has to be searchable. Therefore, the ingestion engine also annotates the data, classifies the content, classifies for language and tags. The search engine crawls and expands the links in the data. The results are stored back as blobs. These blobs then become publicly searchable. The workflows over the artifacts may be implemented in queues and the overall timing of the tasks may be tightened to make the results available within reasonable time to the end user in their search results.

319) The increase in the data size after annotations and search engine suitability is usually less than double the size of original data.

320) Strong consistency is an aspect of data not the operations. A simple copy-on-write mechanism and versions is sufficient to enable all accesses to be seen by all parallel processes in their sequential order.

321) Multi-Tenancy is a choice for the workload not for the infrastructure. If the storage product requires multiple instances for their own product, then it is dividing the resources versus making most of the resources with shared tenancy. Unless there is a significant boost to performance to a particular workload, the cost does not justify workloads to require their own instances of storage products.

322) Along with tenancy, namespaces can also be local or global.  Global namespaces tend to be longer and less user-friendly. On the other hand, global namespaces can enforce consistency

323) Cost of storage is sometimes vague because it does not necessarily encompass all operational costs for all the units because the scope and purpose changes for the storage product. The cost is not a standard but we can get comparable values when we take the sum of the costs and divide it for unit price.

324) Cost is always a scalar value and usually calculated by fixing parameters of the system. Different studies may use the same parameters but have widely different results. Therefore, it is not good practice to compare studies unless they are all relatively performing the same.

325) The total cost of ownership encompasses cost for operations and is usually not reflected on new instances of the storage product. It is used with products that have been used for a while and are becoming a liability.

No comments:

Post a Comment