Let us now discuss data virtualization for query over object storage. The destination of queries is usually a single data source but query execution like any other application retrieves data without requiring technical details about the data such as where it is located. Location usually depends on the organization that has a business justification for growing and maintaining data. Not everybody like to dip right into the data lake right in the beginning of a venture as most organizations do. They have to grapple with their technical need and changing business priorities before the data and its management solution can be called stable.
Object Storage unlike databases allows for incredible almost limitless storage with sufficient replication groups to cater to organizations that need their own copy. In addition, the namespace-bucket-object hierarchy allow the level of separation as organizations need it.
The role of object storage in data virtualization is only on the physical storage level where we determine which site/zone to get the data from. A logical data virtualization layer that knows which namespace or bucket to go to within an object storage does not come straight out of the object storage. A gateway would server that purpose. The queries can then choose to run against a virtualized view of the data by querying the gateway which in turn would fetch the data from the corresponding location.
There are many levels of abstraction. First, the destination data source may be within an object storage. Second the destination data source may be from different object storage. Third the destination may be from different storages such as an object storage and cluster file system. In all these cases, the virtualization logic resides external to the storage and can be written in different forms. Commercial products of this kind such as Denodo are proof that this is a requirement for businesses. Finally, the logic within the virtualization can be customized so that queries can make the most of it. We refer this article to describe the usages of data virtualization while here we discuss the role of object storage.
Object Storage unlike databases allows for incredible almost limitless storage with sufficient replication groups to cater to organizations that need their own copy. In addition, the namespace-bucket-object hierarchy allow the level of separation as organizations need it.
The role of object storage in data virtualization is only on the physical storage level where we determine which site/zone to get the data from. A logical data virtualization layer that knows which namespace or bucket to go to within an object storage does not come straight out of the object storage. A gateway would server that purpose. The queries can then choose to run against a virtualized view of the data by querying the gateway which in turn would fetch the data from the corresponding location.
There are many levels of abstraction. First, the destination data source may be within an object storage. Second the destination data source may be from different object storage. Third the destination may be from different storages such as an object storage and cluster file system. In all these cases, the virtualization logic resides external to the storage and can be written in different forms. Commercial products of this kind such as Denodo are proof that this is a requirement for businesses. Finally, the logic within the virtualization can be customized so that queries can make the most of it. We refer this article to describe the usages of data virtualization while here we discuss the role of object storage.
No comments:
Post a Comment