Saturday, September 8, 2018

The bridge between relational and object storage   

We review the data transfer between relational and object storage. We argue that the natural migration of relational data to object storage is via a data warehouse.    We further state that this bridge is more suited for star schema design as input than other forms of structured data. Finally, all three data stores may exist in the cloud as virtual stores and can each be massively scalable.

Role of Object Storage: 

Object storage is very good for durability of data. Virtual data warehouses often directly use S3 API to stash interim analytical byproducts. The nature of these warehouses is that they accumulate data over time for analytics. Since the S3 Apis are a popular programmatic access to store files in Object storage, most of the data in the warehouses may often be translated to smaller files or blobs and then stored in object storage. A connector works similarly but when the tables are in a star schema, it is easier to migrate the tables. The purpose of the connector is to move data quickly and steadily between source as structured store and destination as object storage. Therefore, the connector needs to fragment and upload in multiple parts into the object storage and S3 Api with their feature for multi part upload into the object storage is a convenience. The connector merely automates these transfers which would otherwise have been pushed rather than pulled by the object storage. This difference is enormous in terms of convenience to stash and for reusability of the objects for reading. The nature of the automation may also be flexible to store the fragments of data in the form that is most likely used during repeated analytics. The star schema is a way of saying that there are many tables and they are joined into a central table for easier growth and dimensional analysis.  This makes the data stand out as independent dimensions or facts which can be migrated in parallel. In addition to the programmatic access for stashing, object storage is widely popular over conventional file storage making it all the more appealing for durable stashes or full copy of data. While transactional data may have highly normalized tables, there is nothing preventing it from being translated into unraveled star schema. Besides they are already a part of the chain between how transactional data ends up in a warehouse. The object storage does not have to concern itself with the data store directly and can exist and operate without any impact to warehouse or object storage that may well be a lot bigger than the database.

With this context, let us consider how to write such a connector. In this regard, we are lured by the dimensions of the snowflake design to be translated into objects There are only two aspects that need to be concerned about. First is the horizontal and vertical partitioning of a dimension and the second is the order and co-ordination of transfers from all dimensions. The latter may be done in parallel and with the expertise of database migration wizards. Here the emphasis is on the checklist of migrations so that the transfer is merely in the form of archival. Specifically, in transferring a table we repeat the following steps: while there are records to be migrated between the source and the destination, select a record,  check if it exists in the destination. If it doesn’t we write it as a key-value and repeat the process. We make this failproof by checking at every step. Additionally, where the source is a staging area, we may choose delete from the staging so that we can keep it trimmed. This technique is sufficient to write data in the form of key-values which are then more suited for object storage

Conclusion:

Data is precious and almost all platforms compete for a slice of the data. Data is sticky so the platform and tools built around the data increase in size. Object Storage is a universally accepted store for its purpose and the connector improves the migration of this data to the store. Although the word migration is used to indicate transfer, it does not necessarily mean taking anything away from the origin.

No comments:

Post a Comment