Cluster computing

Thursday, February 27, 2020

Archival using data stream store:

As new data is added to existing and the old ones retired, the data storage grows to a very large size. The number of active records within the store may only be a fraction and is more often segregated by a time window. Consequently software engineers perform a technique called archiving which moves older and unused records to a tertiary storage. This technique is robust and involves some interesting considerations as discussed in the earlier post

With programmability for streams, it is relatively easy to translate the operations described in the earlier post to a stream store. The streams have bands of cold, warm and hot data with progressive frontiers that make it easy to adjust the width of each region. The stream store is already considered durable and fit for archival, so the adjustment of width alone can overcome the necessity to move data. Some number of segments from the cold store can become candidates for off site archival.

Tiering enables policies to be specified for generational mark-down of data and its movement between tiers. This enables differentiation of hardware for space to suit various storage traffic. By providing tiers, the storage space is now prioritized based on media cost and usage. Archival systems are considered low cost storage because the data is usually cold.
Data Warehouses used to be the graveyard for online transactional data. As data is passed to this environment, it changes from current value to historical data. As such a system of record for historical data is created and this is then used for all kind of DSS processing. The Corporate Information Factory that the data warehouse evolved into had two prominent features - the virtual operational data store and the addition of unstructured data. The VODS was a feature that allowed organizations to access data on the fly without building an infrastructure. This meant that corporate communications could now be combined with corporate transactions to paint a more complete picture. CIF had an archival feature whereby data would now be transferred from data warehouse to nearline storage using Cross media storage manager (CMSM) and then retired to archival.
Stream stores don’t have a native storage. They are hosted on tier 2 so they look like file and blobs and are subsequently sent to their own tertiary storage. If the stream stores were native on disk, its archival would target the cold end of the streams.
Between files and blobs, we suggest object storage to be better suited for archival. We suggest that Object storage is best suited for using blobs as inputs for backup and archival and fits very well in the Tier 2 in the tier-ing earlier:
Here we suggest that the storage class make use of dedicated long term media on the storage cluster and a corresponding service to auto promote objects for aging.

Cluster computing

Thursday, February 27, 2020

No comments:

Post a Comment