Wednesday, August 26, 2015


How do we search the object store ?

There have been a couple of mentions in my previous post but not a solution. Let us face it, the S3 apis only permit search on the prefix-included-name for objects. And even that is limited to a PCRE regular expression search. Consequently, a lot of time may be spent in coming up with naming conventions. The trouble with naming conventions is that it static and may even require changing of the names when there is an attribute change or some conflict arises.

On the other hand, metadata for each object is available on an iterator basis. This means that we can iterate one object after another and match its metadata to that of the query. For example, if we want to find out all objects created by a certain owner, then we scan all the objects and match its owner or created_by field to the value in the query.

Arguably, the name and the metadata of an object are smaller in size than the average object size. In other words, we could keep a mirror of the object store with empty files for each of the corresponding object in the object store. We therefore have the name and metadata of each.

Another way to do this would be to create an index over the prefix names. A SQL table with the object identifier and its metadata attributes as columns or relation to key-value table would suffice. The table will have an index on prefix and even the metadata values for those field that are common to all.

With an index on the prefix name, searching and sorting in TSQL becomes far more easier. A clustered sequential index on the name or object key would even help reduce the disk access.

Moreover adding a table for the name and metadata lends itself to standard query operators. Operators like Select, Join, Union, intersect, Except, Distinct, Range, SequenceEqual, Skip, SkipWhile, Where etc can be seamlessly performed on the object keys which makes it easier to come up with a final result set of the objects of interest. Moreover, aggregator operations can also be performed in addition to different kinds of positional access.

Lastly, the object store enables such index to be an object itself in the object store. So we don’t need to keep another database for this purpose.

#codingexercise
bool GetMax (node root) {
If (root  == null) return false;
if (root.right == null) return root;
while (root && root.right)
       root = root.right;
return root;
}

No comments:

Post a Comment