Sunday, June 21, 2020

File System Management interface exists because users find themselves as manager of their own artifacts. They create a lot of files, then they have to find, organize, secure, service, migrate and perform the same actions on one or more artifacts. Many universal runtimes recognized this need as much as the operating system and the FileSystemObject Interface became widely recognized standard across languages, runtimes and operating systems.
With the move from personal computing to cloud computing, we now have files, blobs and streams. There are a few differences between filesystems and object storage in terms of file operations such as find and grep that are not well-suited for object storage. However, the ability to search object storage is not limited from the API. The S3 API can be used with options such as cp to dump the contents of the object to stdout or with grok. In these cases, it becomes useful to extend the APIs.
 Most developers prefer the filesystem for the ability to save with name and hierarchy. In addition, some setup watch on the file systems. In the object storage, we have equivalents of paths and we can enable versioning as well as retention. As long as there are tools, SDK and API for promoting the object storage, we have the ability to finish it as a storage tier as popular as the filesystem.  There is no more a chore to maintain a file-system mount and the location that it points to. The storage is also virtual since it can be stretched over many virtual datacenters.  The ability to complete some tools such as for grep, SDK and connectors will improve the usage considerably.
File-systems have long been the destination to store artifacts on disk and while file-system has evolved to stretch over clusters and not just remote servers, it remains inadequate as a blob storage. Data writers have to self-organize and interpret their files while frequently relying on the metadata stored separate from the files.  Files also tend to become binaries with proprietary interpretations. Files can only be bundled in an archive and there is no object-oriented design over data. If the storage were to support organizational units in terms of objects without requiring hierarchical declarations and supporting is-a or has-a relationships, it tends to become more usable than files.
Streams is considered a nascent technology but one that is more natural to storage because most storage often sees an append only access to writing data over a long period of time. Stream storage provide abilities to read and write but have yet to provide a stream management interface similar to those seen on file-systems that automate workflows using these containers such as finding, copying etc.  
The Stream data store is equally suited for both a sink and a participant in a data pipeline. The Stream management operations will be suited for the latter. The use of connectors and appenders enhance the use of storage product for data transfers.  Connectors funnel data from different data sources and are typically one per type of source. Perhaps the simplest form of appender is an adapter pattern that appears as a proxy between the stream store as the sender and the application as the receiver. The publisher-subscriber pattern recognizes that receiver may not need to be a single entity and can be a distributed application. The backup pattern buffers the data with scheduled asynchronous writes more suitable for stream store and with little or no disruption to the data sink. The compression and deduplication pattern is about the archival form of data. The replication pattern is about the cloning and syncing of data between similar storage. Each of these workflows translate to stream management layer functions. The implementation of this layer could even be written in the applications that use the client to the stream store. 
Sample Stream workflow automation: https://github.com/ravibeta/JavaSamples/StreamDuplicity

 

 

No comments:

Post a Comment