Cluster computing: A search service over blobs

Cloud attracts unstructured data in the form of blobs in object storage. Since it is inevitable and unstructured content escapes the usual organization associated with relational data, some form of search service could be helpful to the end-user who must search the blobs from different data sources. This document focuses on the technical considerations for making this leap and leaves the possibility of making a business case as out of scope in this discussion.

Desktop search, enterprise search and even internet search speak to the versatility and popularity of search tools. These tools are popular for many reasons, but they are all targeted towards the end-user who must navigate through a large collection in its absence. The query layer for search service sometimes requires its own pipeline albeit from the same dataset. Relational data management provided a foundation for SQL queries to be written.

Let us consider object storage as a destination for logs to improve production support drastically with the ability to search the store. A query layer can directly re-use the object storage as its data source. The store is limitless and has no maintenance. Log stores are typically time-series databases. A time-series database makes progressive buckets as each one fills with events and this can be done easily with object storage too. The namespace-bucket-object hierarchy is well suited for time-series data. There is no limit to the number of objects within a bucket and we can roll over buckets in the same hot-warm-cold manner that time series databases do. Moreover, with the data available in the object storage, it is easily accessible to all users for reading over the HTTP. The only caveat is that some production support requests may be made to accommodate separate object–storage for the persistence of objects in the cache from the object-storage for the persistence of logs. This is quite reasonable and maybe accommodated on-premise or in the cloud depending on the worth of the data and the cost incurred. The log stores can be periodically trimmed as well. In addition, the entire querying stack for reading these entries can be built on copies or selections of buckets and objects. More about saving logs and their indexes in object storage is available at: https://1drv.ms/w/s!Ashlm-Nw-wnWt3eeNpsNk1f3BZVM

Cluster computing

Monday, August 9, 2021

A search service over blobs

No comments:

Post a Comment