Cluster computing: September 2018

Sunday, September 30, 2018

Today we continue discussing on the text summarization techniques. We came up with the following steps :
Def gen_proposals(proposals, least_squares_estimates):
# a proposal is origin, length, breadth written as say top-left and bottom-right corner of a bounding box
# given many known topic vectors, the classifer helps detect the best match.
# the bounding box is adjusted to maximize the intersection over union of this topic.
# text is flowing so we can assume bounding boxes of sentences
# fix origin and choose fixed step sizes to determine the adherence to the regression
# repeat for different selections of origins.
Pass

def get_iou_topic(keywords, topic):
Return sum_of_square_distances(keywords, topic)
Pass

Def gen_proposals_alternative_without_classes(proposals, threshold)
# cluster all keywords in a bounding box
# use the threshold to determine high goodness of fit to one or two clusters
# use the goodness of fit to scatter plot bounding boxes and their linear regression
# use the linear regression to score and select the best bounding boxes.
# select bounding boxes with diversity to overall document cluster
# use the selected bounding boxes to generate a summary
pass

The keyword selection is based on softmax classifier and operates the same regardless of the size of input from bounding boxes. Simultaneously the linear regressor proposes different bounding boxes.
We have stemming, keyword selection as common helpers for the above method. In addition to classification, we measure goodness of fit. We also keep a scatter plot of the bounding boxes and the goodness of fit. We separate out the strategy for the selection of bounding boxes in a separate method. Finally, we determine the summary as the top discrete bounding boxes in a separate method.
Topics unlike objects have an uncanny ability to be represented by one or more keywords. Just like we cluster similar topics in a thesaurus the bounding boxes need to compare only with the thesaurus for matches.
We are not looking to reduce the topics to words or classify the whole bag of words. What we are trying to do is find coherent clusters by determining the size of the bounding box and the general association of that cluster to domain topics. Therefore, we have well known topic vectors from a domain instead of collocation based feature maps which we train as topic vectors and use those in the bounding boxes for their affiliations.

The ability to discern domain is similar to discern latent semantics using collocation. The latter was based on pointwise mutual information, reduced topics to keywords for data points and used collocation data for training the softmax classifier that relied on the PMI. Here we use feature maps that are based on the associated cluster.

Keywords form different topics when the same keywords are used with different emphasis. We can measure the emphasis only by the cluster. Clusters are dynamic but we can record the cohesiveness of a cluster and its size with the goodness of fit measure. As we record different goodness of fit, we have a way of representing not just keywords but also the association with topics. We use the vector for the keyword and we use the cluster for the topic. An analogy is a complex number. For example, we have a real part and we have a complex part. The real-world domain is untouched by the complex part but the two together enables translations that are easy to visualize.

The metric to save cluster information discovered in the text leads us to topic vectors where the features maps are different from collocation-based data. This is an operation over metadata but has to be layered with the collocation data

Saturday, September 29, 2018

This article is in continuation of a previous post. We were referring to the design of message queues using object storage. Most message queues scale by virtue of the number of nodes in a cluster based deployment. Object Storage is accessible over S3 APIs to each of these nodes. The namespaces and buckets are organized according to the queues so that the messages may be looked up directly based on the object storage conventions. Since the storage takes care of all ingestion related concerns, the nodes merely have to utilize the S3 APIs to get and put the messages. In addition, we brought up the availability of indigenous queues to be used as a background processor in case the data does need to be sent deep into the object storage. This has at least two advantages. First, it is flexible for each queue to determine what it needs to do with the object. Second the scheduled saving of all messages into the object storage works well for the latter because it is continuous feed with very little read access.

This prompted us to separate this particular solution in its own layer which we called the cache layer so that the queues may work with the cache or with the object storage as required. The propagation of objects from cache to storage may proceed in the background. There are no mandates for the queues related to the cache to serve user workloads. They are entirely internal and specific to the system. Therefore the schedule and their operation can be set as per the system configuration.

The queues on the other hand have to implement one of the protocols from AMQP, STOMP or so on. Also, customers are likely to use the queues in one of the following ways each of which implies a different layout for the same instance and cluster size.

The queues may be mirrored across multiple nodes – This means we can use a cluster
The queues may be chained where one feeds into the other – This means we can use federation
The queues may be arbitrary depending on application needs – This means we build our own aka the shovel work

Consequently the queue layer can be designed independent of the cache and the object storage. While Queue services are available in the cloud and so are the one-stop—shop cloud databases, this kind of stack holds a lot of promise in the on-premise market.

While the implementation of the queue layer is open, we can call out what it should not be. The queues should not be implemented as micro-services. This fails the purpose of the message broker as a shared platform to alleviate the dependencies that the micro-services have in the first place. Also the Queues should not be collapsed into the database or the object storage unless there is runtime to process the messages and the programmability to store and execute logic. With these two extremes, the queue layer can be fashioned as an api gateway, switching fabric and anything that can handle retries, poison queue, dead letters and journaling. Transactional semantics are not the concern here since we are relying on versioning. Finally, the queues can use existing products such as ZeroMQ, RabbitMQ if they allow customizations for on-premise deployment of this stack.

Friday, September 28, 2018

Object Storage is not inherent to a cluster. It does not participate in another cluster. Many applications utilize a cluster to scale. And they don’t interact with each other by any means other than as application endpoints or a shared volume. A database, file-system or unstructured storage may be hosted elsewhere and then used with a cluster within an application. Consequently, the storage and the application form separate layers. An application that utilizes a cluster specifically for messaging is a message queue server also called a message broker. Message Queues facilitate an alternative for applications and services to send and receive data. Instead of directly calling the receiver by the sender, a message broker allows the sender to leave a message and proceed. This makes a microservice architecture become even more lean and focused while the messaging framework is now put in its own layer. Message queues also enable a number of processors to operate on a variety of messages while being resilient to errors in general. Since messages can grow to arbitrary size and their numbers can be mind boggling, the messages need to be saved where they can be retrieved and updated.

Object Storage can store messages very well. The Queues are not nested and the hierarchy within object storage allows for grouping of these messages as easily as in a queue. Generally, a database is used for the storage of these messages but it is not for the transactional semantics. Even a file system could be sufficient for these messages. Object storage, on the other hand, is perceived as backup and tertiary storage otherwise. This may come from the interpretation that this storage is not suitable for read and write intensive data transfers that are generally handled by file-system or database. However, not all data needs to be written deep into the object storage at once. The requirements for object storage need not even change while the reads and writes from the applications can be handled with a background processor. There can be a middle layer as a proxy for a file system to the application while utilizing the object storage for persistence. This alleviates performance considerations to read and write deep into the private cloud each time. Therefore, a Queue Layer may use the Object Storage with slight modifications. And it offers the same performance as it continued to provide. The Queues not only work as a staging for application data but also as something that asynchronously dispatches into object storage.

Queue service has been a commercially viable offering and utilize a variety of protocols. Message Queue is an example of a Queue service that has shown substantial improvements to APIs. Since objects are also accessed via S3 web Apis, the use of such Queue service works very well if each message is stored and retrieved individually. Traditional Queue services have usually maintained ordered delivery of messages, retries, dead letter handling, along with journaled messages and Queue writes and their writes have been mostly write-throughs which reach all the way to the disk. This service may be looked at in the form of a cloud service that not only maintains its persistence in the object storage but also uses the hierarchy for isolating its queues.

Thursday, September 27, 2018

The centerpiece for the solution to the problem statement yesterday
The Queue can be used independent of the Object Storage.

The use of a Queue facilitates distributed communications, request routing and batched writes. It can be offloaded to hardware. Queues may utilize Message Queuing software such as RabbitMQ, ZeroMQ and their solution stacks. They need not be real web servers and can route traffic to sockets on steroids. They may be augmented globally or in a partitioned server. 
Moreover, not all the requests need to reach the object storage. In some cases, web Queue may use temporary storage from hybrid choices. The benefits of using a web Queue including saving bandwidth, reducing server load, and improving request-response time. If a dedicated content store is required, typically the queuing and server are encapsulated into a content server. This is quite the opposite paradigm of using object storage and replicated objects to directly serve the content from the store. The distinction here is that there are two layers of functions - The first layer is the Queue layer that solves distribution using techniques such as queuing, message handling and message processor organization. The second layer is the compute and storage bundling in the form of a server or a store with shifting emphasis on code and storage.  We will call this the storage engine and will get to it shortly. 
The Queue would do the same as an asynchronous write without any change in the application logic and to multiple recipients.   

Wednesday, September 26, 2018

The design of Queue Layer over object storage  
Problem statement:
Object Storage is perceived as backup and tertiary storage. This may come from the interpretation that this storage is not suitable for read and write intensive data transfers that are generally handled by file-system or database. However, not all data needs to be written deep into the object storage at once. The requirements for object storage need not even change while the reads and writes from the applications can be handled with a background processor. There can be a middle layer as a proxy for a file system to the application while utilizing the object storage for persistence. This alleviates performance considerations to read and write deep into the private cloud each time. That is how this Queue Layer positions itself. It offers the same performance as enterprise message queue does to handle the workload and while it may use its own intermediate storage, it works as a staging for the data so that the data has a chance to asynchronously dispatched into object storage.
Queue service has been a commercially viable offering. Message Queue is an example of a Queue service that has shown substantial improvements to APIs. Since objects are accessed via S3 Apis, the use of such Queue service works very well. However, traditional Queue services have usually maintained ordered delivery of messages, retries, dead letter handling, along with journaled messages and Queue writes and their writes have been mostly write-throughs which reach all the way to the disk. This service may be looked at in the form of a cloud service that not only maintains its persistence in the object storage but also uses the hierarchy for isolating its queues.
Queue Service works closely with a cluster based storage layer or a database server and have traditionally been long standing products in the marketplace. RabbitMQ is a message broker that relays messages to processors. This Queue layer is well-positioned for web application traffic as well as those that utilize S3 APIs directly. Data reads and writes need not be synchronous anymore and callers and clients may do well to delegate it to the queues as a background task. Moreover the object storage can leverage geographical replication of objects within its layer. As long as this queuing layer establishes sync between say a distributed or cluster file system and object storage with duplicity-tool like logic, it can roll over all data eventually to persistence.

#codingexercise
Void generateAlternateCubes() {
Var cubes = GetCubes ();
cubes.Enumerate( (I , e) => { if ( (i%2 == 0) { Console.writeline (e); }} );
}

Tuesday, September 25, 2018

We were discussing the design of cache layer over Object Storage. We have not compared this cache layer with a message queue server but there are interesting problems common to both. For example, we have a multiple producer single subscriber pattern in the periodic backups to the object storage. The message queue server or broker enables this kind of publisher-subscriber pattern with retries and dead letter queue. In addition, it journals the messages for review later. Messaging protocols are taken up a notch in performance with the use of a message queue broker and their reliance on sockets with steroids. This leaves the interaction between the caches and the storage to be handled elegantly with well-known messaging framework. The message broker inherently comes with a scheduler to perform repeated tasks across publishers. Hence it is easy for the message queue server to perform as an orchestrator between the cache and the storage, leaving the cache to focus exclusively on the cache strategy suitable to the workloads. Journaling of messages also helps with diagnosis and replay and probably there is no better store for these messages than the object storage itself. Since the broker operates in a cluster mode, it can scale to as many caches as available. Moreover, the journaling is not necessarily available with all the messaging protocols which counts as one of the advantages of using a message broker. Aside from the queues dedicated to handle the backup of objects from cache to storage, the message broker is also uniquely positioned to provide differentiated treatment to the queues. This introduction of quality of service levels expands the ability of the solution to meet varying and extreme workloads The message queue server is not only a nice to have feature but also a necessity and a convenience when we have a distributed cache to work with the object storage.

#codingexercise

Bool IsSubsetProduct(ref List<int> factors, int product)

{

Assert (items.all (x => x > 0));

If (product == 1) return true;

If (product <= 0) return false;

if (items.Count() == 0 && product != 0) return false;

Var last = factors.last();

factors.RemoveAt(factors.Count() - 1);

if (last> product || product % last != 0 ) {

Return IsSubsetProduct(ref factors, product);

}

Return isSubsetProduct(ref factors, product) || IsSubsetProduct(ref factors, product/last);

}

Monday, September 24, 2018

We were discussing the design of the Object cache layer versus the design of the Object Storage. The design from the cache may involve a very different deployment as compared to the object storage. There is no necessity for the cache layer to be handled by one server. There can be multiple servers within the object cache each with its own implementation of the server to determine the schedule of persistence of the objects to the store. This enables scale out of the cache layer across different nodes so that they handle only a small subset of objects. With a cluster-based deployment or the choice of a set of proxy servers, the cache may have a wide variety of choices for its implementation. The object storage is entirely cluster based and virtualizes the data center with its implementation of storage pool. Its design is somewhat more restricted than the cache because it is storage oriented. The cache on the other hand is focused on the workloads and may choose to partition based on the object distribution to different cache servers. It can simply involve a distributed hash table and has similar distribution strategy as the gateway service in terms of the distribution of the objects to the designated caches. A Peer to peer network may be overlaid over the object storage to determine this object-cache distribution. There is no necessity to have a cluster only approach to the cache layer. There are several benefits to this approach. First each cache is concerned with a small subset of the objects. Hence it is able to serve this workload better than if everything was part of the same cache. Second the implementation of the cache is now independent of the compute at that peer which allows far more commodity servers than if they were part of a cluster. Third each and every cache may store objects to the same object storage on the backend but provide the option to be bound to a specific workload allowing scale up where necessary. Finally, the ability of such a cache to perform flush to disk is entirely encapsulated within the peer in a deeply isolated stack which enables far superior performance than a distributed file system. All these considerations make the cache layer stand out from the storage layer.

Sunday, September 23, 2018

The cache layer can just as well utilize versioning if the updates were frequent and the objects were small and numerous. All the concerns with the persistence reside with the storage layer including the form and representation of the objects. Whether the objects are saved as repeated full byte-ranges of the objects or incremental updates to previous version is entirely the storage layers concern. On the other hand, the cache layer determines the schedule and the load of the updates. Therefore, it may choose to persist some objects more often than others. The dynamic schedule is very helpful even to the customers workload because not all workloads can be satisfied by the same schedule. There are really two aspects to this dynamic schedule. First the cache layer determines which objects belong to which groups based on policies that can be evaluated on the nature of the workload. For example, heavy continuous writes require frequent persistence otherwise there is a chance that the updates might be lost. Light weight writes with heavy reads do not require frequent persistence otherwise the cached object will become invalidated more often than is necessary. Second, the cache layer decides between flush and backup operations. These operations are governed by the pools of treatments to objects that the cache layer maintains. The cache layer becomes smart merely by associating a group to a pool. While it may allow customizations to how the objects are mapped to groups, it reserves the administrative mapping of groups to pools of service levels. The pools have varying schedules for flush and backup operations. The flush to local disk and the backup from local disk to object storage are performed the same without any dependence on the object or the schedule. As long as the objects fall in one of the queues, it will be serviced. The flush operation is largely unknown to the object storage since it is a convenience only for the cache layer to prevent data loss. The backup operation to object storage is however as streamlined as possible so that the data ingestion rate never goes down and the load can be met with adequate service level agreement. The cache layer provides the ability to fine tune the behavior at an object level and whether it stores segments or files before sending the object to the storage layer is its own concern. The two layers have mutually independent concerns but provide synergy in the form of a wider appeal of a durable store for all data generators. This cache layer could be internalized into the object storage but there is more benefit if they are separate. Unlike a gateway service that provide address resolution of an object to a specific site, a cache layer cannot be brought into the object storage because the name resolution is different from augmented read and write paths. If the object resides in the storage and is merely accessed via address resolution, there are no changes to the write on the object. On the other hand, the cache layer is heavy on the writes to the object storage and tries to be smart about sending the objects to the store. If this were to be part of the object storage, it would unnecessarily affect the writes of all objects. The object storage incurs cost to provide the kind of storage it does. It becomes harder for the object storage to provide the quality of service to individual workloads when it is performing distributed operations in a global store. The cache layer provides benefits outside the domain of the object storage assuming the latter is already the best for its domain. Therefore, the object cache and the object storage have to be separate layers and may be implemented with different designs

Saturday, September 22, 2018

We resume our discussion of using a cache layer with Object Storage. We said we could let objects age before they make it to storage. There were several benefits to this approach. First we need store the objects as versions. Second we could take incremental backups. Third along with the backup we retained the ability to reconstruct the object though archived data is generally not retrieved. We utilize archiving techniques like deduplication to reduce the footprint of the storage. Also, the data is made invulnerable when the old data is not modified by the new data. The new data is written separately and the good data from the old is never lost. This is demonstrated by many of the public clouds. For example, Windows Azure uses extent and many storage appliances do something similar especially with stream storage. This holds true with reconstruction as well and not just with versioning. Consequently, even deduplication appliances practice data invulnerability with their efforts to find segments that change.
Why do we focus on reconstruction as opposed to versioning of objects from the cache? This depends entirely on the schedule of the backup of objects from the cache to the object storage and the footprint we want to keep in the storage tier. The cache layer and the storage layer are separate. Both of them can have processing that can make those respective layers smart.

When we have immutable representation of data no matter how it is stored, it can provide immense benefits to data security, audit and restore. This idea for invulnerability is seen in many different forms of technologies. It not only helps with preservation but also helps with error codes. As long as the data has not changed, the error codes do not change. This is primarily the motivation behind durability.

Friday, September 21, 2018

Today we take a break from our discussions on Object Storage to review a few methods for topic detection in documents.
# Divides the region proposals on the Intersection-over-union value
def divideset(proposals,iou,value):
# Make a function that tells us if a proposal is a candidate or not.
split_function=None
if isinstance(value,int) or isinstance(value,float):
split_function=lambda proposal:proposal[iou]>=value
else:
split_function=lambda proposal:proposal[iou]==value
# Divide the proposals into two sets
set1 = [proposal for proposal in proposals if split_function(proposal)]
set2 = [proposal for proposal in proposals if not split_function(proposal)]
return (set1, set2)

Def gen_proposals(proposals, least_squares_estimates):
# a proposal is origin, length, breadth written as say top-left and bottom-right corner of a bounding box
# given many known topic vectors, the classifer helps detect the best match.
# the bounding box is adjusted to maximize the intersection over union of this topic.
# text is flowing so we can assume bounding boxes of sentences
# fix origin and choose fixed step sizes to determine the adherence to the regression
# repeat for different selections of origins.
pass

Thursday, September 20, 2018

We were discussing that data can be allowed to age in one later before making its way to another. Let us consider now how to use this aging of objects. We transfer the objects from the cache to the storage when its updates have accumulated to one of the scheduled transfers. This interval could be very small or very large depending on the capability of the cache. By buffering the objects in the cache without writing it to object storage, we provide the ability to study the updates on the object. This might mean byte range updates to the object. If the objects are small, the updates don’t matter as the whole object can be overwritten. If the objects are large, keeping track of the byte range updates helps with the consolidation and re-organization of the updates. Why is this useful? The object storage takes the smallest unit of data storage as the object. It has no notion of what content that object has. Therefore, it sees the objects as byte ranges and keeps track of the updates to the objects in the form of byte ranges. By doing some of this early translation of overlapping byte range updates to non-overlapping updates, we make it easier for the object storage to persist the object. That is not all. How the object cache persists to the object storage may be entirely proprietary to the cache-based solution. Instead of relying on the versioning from the object storage, cache-based solution may propose to store incremental updates. The versioning had several drawbacks. It made unnecessary copies of every version and the whole object was copied. While it was simpler to get the whole object back, it was not necessary to make copies of the byte ranges that did not change. Moreover, with the repackaging of the objects, we now had the chance to perform deduplication of what we store. This allows us to reconstruct the object with what we store because typically not all of the contents change at once.

Wednesday, September 19, 2018

Introduction:
This article is an addition to the notion of a Cache Layer for Object Storage that caches objects and translates workloads into frequently backed up objects so that the changes are routinely persisted into the Object Storage. The notion that data can be allowed to age before making its way into the object storage is not a limiting factor. Object Storage just like file storage and especially when file-system enabled allows direct access for persistence anyways. The previous article referenced here merely pointed to the use cases where the reads and writes to objects are much more often that something shallower than an Object Storage will benefit immensely.
Therefore, this article merely looks at the notion of lazy replication. If we use the cache layer and regularly save the objects from the cache into the Object Storage, it is no different than using a local filesystem for persistence and then frequently backing it up into the Cloud. We have tools like duplicity that frequently backup a filesystem into object storage. Although they use archives for compaction but it is no different from copying the data from source to destination even if the source is a file system and the destination is an object store. The schedule of this copying can be made as frequent as necessary to ensure the propagation of all changes by a maximum time limit.
Let us now look at the replication within the Object Storage. After all, the replication is essentially copying objects across the sites within the storage. This copying was intended for the purposes of durability When we setup multiple sites within a replication group, the object get copied to these sites so that it remains durable against loss. This copying is almost immediate and very well handled within the put method of the S3 API that is used to upload objects into the object storage. Therefore, there is multizone update of the object in a single put command when the replication group spans sites. When the object is uploaded, it may be saved in parts and all the book keeping regarding parts are also safeguarded for durability. Both the object data and the parts location information are treated as logically representing the object. There are three copies of such a representation so that if one copy is lost, another can be used. In addition, erasure codes may allow the reconstruction of an object and so the copy operation may not necessarily be a straightforward byte range copy.
Lazy replication allows for copying beyond these durability semantics. It allows for copying on a scheduled basis by allowing the data to age. There may be many updates to the object between two copy operations and this is tolerated because there is no semantic difference between the objects as long as they are copied elsewhere. Said another way, this is the equivalent of chaining object stores so that the cache layer is an object storage in itself with direct access to the data persistence and the object storage behind it as the one receiving copies of the objects that are allowed to age. Since the copy operations occur on every time interval, there is little or no data loss between the primary and the secondary object storages. We just need a connector that transfers some or all objects in a bucket from a namespace to an altogether different bucket in a different namespace in possibly a different object storage. This may be similar to file sync operations between local and remote file system which also allows for offline work to happen. The difference between the file sync operation and a lazy replication is probably just the strategy. Replication as such has several strategies even from databases where logs are used to replay the same changes in a destination database. The choice of strategy and the frequency is not necessary for the discussion that objects can be copied across object storage.
When Object Storage are linked this way, it may be contrary to the notion that a single Object Storage represents a limitless storage with zero maintenance so that the object once saved will always be found avoiding the use of unnecessary copies. However, the performance impact of using an Object Storage directly as opposed to a local file system may affect certain workloads where it may be easier to stage the data prior to its saving in the Object Storage. Therefore this lazy replication may come in helpful to increase the use cases of the Object Storage.

Tuesday, September 18, 2018

The Cache can be used independent of the Object Storage.

The use of a Cache facilitates server-load-balancing, request routing and batched writes. It can be offloaded to hardware. Caches may utilize Message Queuing. They need not be real web servers and can route traffic to sockets on steroids. They may be augmented globally or in a partitioned server. 

Moreover, not all the requests need to reach the object storage. In some cases, web cache may use temporary storage from hybrid choices. The benefits of using a web cache including saving bandwidth, reducing server load, and improving request-response time. If a dedicated content store is required, typically the caching and server are encapsulated into a content server. This is quite the opposite paradigm of using object storage and replicated objects to directly serve the content from the store. The distinction here is that there are two layers of functions - The first layer is the Cache layer that solves distribution using techniques such as caching, asset copying and load balancers. The second layer is the compute and storage bundling in the form of a server or a store with shifting emphasis on code and storage.  We will call this the storage engine and will get to it shortly. 

The Cache would do the same as an asynchronous write without any change in the application logic.   

Monday, September 17, 2018

We were discussing the role of Cache Service with Object Storage. The requirements for object storage need not even change while the reads and writes from the applications can be handled. There can be a middle layer as a proxy for a file system to the application while utilizing the object storage for persistence. This alleviates performance considerations to read and write deep into the private cloud each time. That is how this Cache Layer positions itself. It offers the same performance as query plan caching does to handle the workload and while it may use its own intermediate storage, it works as a staging for the data so that the data has a chance to age and persist in object storage.
Object Storage is a limitless storage. Data from any workload is anticipated to grow over time if it is saved continuously. Consequently the backend and particularly the cloud services are better prepared for this task. While a flush to local file system with an asynchronous write may be extremely cheap compared to persistence in the cloud as an S3 object, there is no reason to keep rolling over local filesystem data by hand to object storage.

Object storage is a zero maintenance storage. There is no planning for capacity and elastic nature of its services may be taken for granted. The automation of asynchronous writes, flush to object storage and sync of data in cache to that in object storage is now self-contained and packaged into this cache layer.

The cloud services are elastic. Both the storage and the cache services could be deployed in the cloud which not only gives the same benefits to one application or client but also to every department, organization, application, service and workload.

Object storage coupled with this cache layer is also suited to dynamically address the needs for client and application because the former may have been a global store but the latter is able to determine the frequency depending on the workload.  Different applications may tune the caching to its requirements.

Performance increases dramatically when the results are returned as close to the origin of the requests instead of going deep into the technology stack. This has been one of the arguments for web cache and web proxy in general.

Such service is hard to mimic individually within each application or client. Moreover, optimizations only happen when the compute and storage are elastic where they can be studied, cached, and replayed independent of the applications.

The move from tertiary to secondary storage is not a straightforward shift from NAS storage to object storage without some form of chores over time. A dedicated product likes this takes the concern out of the picture.
#linear regression for bounding box adjustment in regions of interest.

This fits a line to the data points.