Cluster computing

Thursday, October 25, 2018

Many systems use the technique of reflection to find out whether a module has a particular data type. An object storage does not have any such mechanism. However, there is nothing preventing the data virtualization layer to maintain a registry in the object storage or the data type defining modules themselves so that those modules can be loaded and introspected to determine if there is such a data type. This technique is common in many run-time. Therefore, the intelligence to add to the data virtualization layer can draw inspiration from these well-known examples
Most of the implementations for a layer is usually in the form of a micro-service. This helps modular design as well as testability. Moreover, the microservice provides http access for other services. This has been proven to be helpful from the point of view of production support. The services that depend on the data virtualization to be running can also directly go to the object storage via http. Therefore, the data virtualization layer merely acts like a proxy. The proxy can be transparent, thin or fat. It can even play the man in the middle and therefore it requires very little change from the caller.
The implementation of a service for data virtualization is not uncommon – both on–premise and in the cloud. In general, most applications benefit from being modular. Service oriented architecture and micro-services play important roles in making applications modular. This is in fact the embrace of cloud technologies as the modules are hosted on their own virtual machines and containers. With the move towards cloud, we get the added benefits of elasticity and billing on a pay per use model that we otherwise do not have. The benefits of managed instances, hosts and containers is never lost on the application modularity. Therefore, data virtualization can also be implemented as a service that can span both on-premise as well as cloud storage.
The data virtualization service is also an opt-in because the consumers can go directly to the storage. For example, the callers will either use the proxy or they won’t so long as they take care of resolving their queries with fully qualified namespaces and data types. It is merely a convenience and very much suited for encapsulating in a module. The testability of the data virtualization also improves significantly.
Unlike other services, data virtualization is exclusively dependent on the nature of data retrieval from the query layer. While data transformation services such as Extract-Transform-Load prepare data in forms that are more suitable for consumer services, there is no read-write operation here. The purpose of fully resolving data types so that queries can be executed depends exclusively on the queries. This is why the language of the queries is important. If it were something standard like the SQL query language then it becomes helpful to bridge that language over unstructured data. LogParser came very close to that by viewing the data as an enumerable but the reason the language for LogParser became highly restrictive was that it needed to be LogParser friendly. If the types mentioned in the log parser queries did not match the data from the Component Object Model (COM) interfaces, Log Parser would not be able to understand the query. If there were a data virtualization layer within the LogParser that mashed-up the datatype to the data set by using one or more interfaces, it would have expanded the type of queries being written with the LogParser. Here, we utilize the same concept for data virtualization over Object Storage.

Cluster computing

Thursday, October 25, 2018

No comments:

Post a Comment