Cluster computing

Wednesday, July 25, 2018

We were discussing the use of object storage as a time series data store. The notion of buckets in a time-series database translates well to object storage. As one gets filled, data can start filling another. With the help of cold, warm and hot labels, it is easy to maintain progression of data. This data can then serve all the search queries over times series just like events in a time series database.
Most time-series databases prefer to use the filesystem directly for their index and store for events without requiring different NoSQL databases. An NFS file system can also be exported as an Object Store. This means existing file system based data files can be served as buckets and objects once they are setup to do so. Object storage products allow a filesystem to be used as such.
This is helpful for existing data. In addition, time-series database products such as log stores can also write their indexes directly to object storage products which then provides more benefits than the filesystems did.
Since time-series databases make progressive buckets as they fill events in each bucket, they are mostly considered with individual buckets. There is no nesting of buckets and its a progression. This suits the hierarchy of buckets and objects. Most time-series buckets are allocated in the user defined indexes. This is very similar to the namespaces in an Object-Storage. There does not need to be a direct mapping between an object storage bucket and a time - series bucket. The latter may even appear as objects within an object storage and the emphasis here is the one level hierarchy between buckets and events. The format of events stored may be proprietary so their storage as objects is opaque to the storage world. Their promotion to object stores not only improves storage but also offers them directly over the http without having to route the request through the time series databases controllers which removes some onus from the layer of the time series database and even facilitates querying.
There may be some concern over moving data up and down the protocol layers to be able to serve them over http. In addition there may be copying operation of remote data over local storage to search the data. However these can be delegated to the object storage and its query package so that the time series database merely focuses on the semantics leaving the optimization to the storage. Most time-series databases shy away from conventional database products simply for the dedicated nature of their offering and the scale of billions of events. Here the entire object storage can be local and serve in place of the filesystem that the time-series database uses.

Cluster computing

Wednesday, July 25, 2018

No comments:

Post a Comment