Cluster computing

Friday, October 26, 2018

The notion that there can be hybrid storage for data virtualization does not jive with the positioning of object storage as a limitless single storage. However, customers choose on-premise or cloud object storage for different reasons. There is no question that some data is more suited to be in the cloud while others can remain on-premise. Even if the data were to be within the same object storage but in different zones, they would have different address and the resolver will need to resolve to one of them. Therefore, whether the resolution is between zones, between storage or between storage types, the resolver has to deliberate over the choices.
The nature of the query language determines the kind of resolving that the data virtualization needs to do. In addition, the type of storage that the virtualization layer spans also depends on the query language. LogParser was able to enumerate unstructured data but it is definitely not mainstream for unstructured data. Commands and scripts have been used for unstructured data in most places and they have worked well. Solutions like Saltstack enabled the same command and scripts to run over files in different storages even if they were on different operating systems. SaltStack used ZeroMQ for messaging and the commands would be executed by agents on those computers. The master on the SaltStack resolved the minions on which the commands and scripts would be run. This is no different from data virtualization layer that needs to resolve the types to different storages.
In order to explain the difference between data virtualization over structured and unstructured storage types, we look at metadata in structure storage. All data types used are registered. Whether they are system builtin types or user defined types, the catalog helps with the resolution. The notion that json documents don’t have a rigid type and that the document model can be extensible is not lost to the resolver. It merely asks for a inverted list of documents corresponding to a template that can be looked up. Furthermore, the logic to resolve json fields to documents does not necessarily have to be within the resolver. The inverted list and the lookup can be provided by the index layer while the resolver merely delegates it. Therefore, the resolver can span different storage types as long as the resolution is delegated to the storage types. The query language for structure and unstructured may or may not be unified with a universal query language but that is beside the consideration that we can different storage types as different stacks and yet have the data virtualization over them.
Query rewrites has not been covered in the topic above. A query describing the selection of entries with the help of predicates does not necessarily have to be bound to structured or unstructured query languages. Yet the convenience and universal appeal of one language may dominate another. Therefore, in such cases whether the query language is agnostic or predominantly biased, it can be modified or rewritten to suit the needs of the storage stacks described earlier. This implies that not only the data types but also the query commands may be translated to suit the delegations to the storage stacks. If the commands can be honored, then they will return results and since the resolver in the virtualization layer may check all of the registered storage stacks, we will eventually get the result if the data matches.

Cluster computing

Friday, October 26, 2018

No comments:

Post a Comment