Monday, October 15, 2018

The object storage as a time-series data-store
Introduction: Niche products like log indexes and time series database often rely on unstructured storage. While they are primarily focused on their respective purpose, they have very little need to focus on storage. Object storage serves as a limitless no maintenance store that not only replaces a file-system as the traditional store for these products but also brings the best from storage best practice. As a storage tier, object storage is more suited to not only take the place of other storage used by these products but also bring those product features into the storage tier. This article examines the premise of using object storage to directly participate as the base tier of log index products.
Description: We begin by examining the requirements for a log index store.
First, the log indexes are maintained as time series database. The data in the form of cold, warm and hot buckets which are used to denote d the progression in the entries made to the store. Each of the index is maintained from a continuous rolling input of data that fills one bucket after another. Each entry is given a timestamp if it cannot be interpreted from the data either by parsing or by some tags specified by the user. This is called the raw entry and is used to parse the data for fields that can be extracted and used with the index. An object storage also enables key values to be written in its objects.  Indexes on raw entries can be maintained together with the data or in separately named objects. The organization of time-series artifacts is agreeable to the namespace-bucket-object hierarchy in object storage.
Second, most log stores accept the input directly over the http via their own proprietary application calls. The object storage already has well known S3 APIs that facilitate data read and write. This makes the convention of calling the API uniform not just for the log indexing service but also the callers.  The ability to input data via the http is not the only way to do so. A low-level connector that can take file system protocols for data input and output may also improve the access for these time series databases. A file-system backed object storage that supports file-system protocols may serve this lower level data path.  A log store’s existing file system may directly be replicated into an object storage and this is merely a tighter integration of the log index service as native to the object storage if the connectors are available to accept data from sockets, files, queues and other sources.
Third, the data input does not need to be collected from data sources. In fact, log appenders are known to send data to multiple destinations and object storage is merely another destination for them. For example, there is an S3 api based log appender http://s3appender.codeplex.com/ that can be used directly in many applications and services because they use log4j. The only refinement we mention here is that the log4j appenders need not go over the http and can used low level protocols as file system open and close.
Fourth, the object storage brings the best in storage practice in terms of durability, redundancy, availability and organization and these can be leveraged for simultaneous log analytical activities that were generally limiting in a single log index server.
Conclusion: The object storage is well suited for saving, parsing, searching and reporting from logs.
Reference: https://1drv.ms/w/s!Ashlm-Nw-wnWt2h4_zHbC-u_MKIn

No comments:

Post a Comment