Namespace, Buckets, Objects and their use with Querying
FileSystem does not lend itself to the same querying capabilities
and performance as database tables do. Directories and files from a file-system
are enumerated using iterations. Database tables have indexes allowing faster
access than sequential scan from iterations. There is nothing that works quite
like a database for efficient querying both historically and for the physics
that the local data access is far more efficient than remote data access
especially when it is organized at the finest granularity of the data and
managed with metadata and query caching. We have realized cloud databases where
the remote access does not matter to the service level agreement for the
business transactions but we have yet to realize database like queries over an
object store.
How then do we use an object storage as a data store and why
do we need to enable it for querying?
Object Storage is immensely popular in the cloud just as
filesystem was for stashing data. As a low level primitive it helped the
evolution of data warehouses in the cloud. There are many uses of structured
and unstructured data that makes its way to the object storage. However, aside
from their availability over http, the data generally goes dark and opaque. Since
there are no built-in capabilities of data processing of relational or NoSQL
databases, it does not become part of higher business purpose software stack
and remains as infrastructure.
Do Databases need to be built on top of object storage. That
may be interesting concept but the data management techniques within the
database namely locking, logging, indexes, catalog, caching, query plans etc are
tightly coupled to the database and its central view of hierarchical metadata.
Objects on the other hand carry metadata with themselves and while the absence
of index at the object storage level may be compensated by the creation of new
objects by a query processing layer that is stacked on top of the object storage,
the query processing is inherently different from the binary search on a
clustered index.
Will the querying capabilities on the object storage
increase its adoption?
We are adding compute to storage. There is no limit to the
possibilities once we take the virtualization that some object stores enable.
Without the compute, the storage solution of such object stores satisfies
immense and diverse requirements. With the compute and data processing
capabilities offered out of the platform, the operations expand beyond create,
update and delete to performing standard query operations that can support a
dashboard of charts and graphs, participate in streaming queries or improve the
metadata of the objects. For example, the usage statistics of the object may now
be part of the metadata.
Conclusion: A specific data processing kit specific to
object storage may be a great library to be included with an object storage solution.
#codingexercise
Find the number of ways we can place tiles if there are one or two tile units :
int GetCount(uint n)
{
if ( n == 0) return 0;
if (n == 1) return 1;
if (n == 2) return 2;
return GetCount(n-1)+GetCount(n-2);
}
No comments:
Post a Comment