Cluster computing

Friday, July 6, 2018

Namespace, Buckets, Objects and their use with Querying

FileSystem does not lend itself to the same querying capabilities and performance as database tables do. Directories and files from a file-system are enumerated using iterations. Database tables have indexes allowing faster access than sequential scan from iterations. There is nothing that works quite like a database for efficient querying both historically and for the physics that the local data access is far more efficient than remote data access especially when it is organized at the finest granularity of the data and managed with metadata and query caching. We have realized cloud databases where the remote access does not matter to the service level agreement for the business transactions but we have yet to realize database like queries over an object store.

We are adding compute to storage. There is no limit to the possibilities once we take the virtualization that some object stores enable. Without the compute, the storage solution of such object stores satisfies immense and diverse requirements. With the compute and data processing capabilities offered out of the platform, the operations expand beyond create, update and delete to performing standard query operations that can support a dashboard of charts and graphs, participate in streaming queries or improve the metadata of the objects. For example, the usage statistics of the object may now be part of the metadata.

When we iterate namespaces, buckets and objects, we often have to rely on sequentially visiting each one of them. There is no centralized data structure that speeds them up nor are they organized in a sorted manner unlike the indexes. These S3 artifacts may be stored over a layer that might facilitate data structure that speed lookup. One such example is a B-plus tree – a data structure that relies on storing ranges by their keys. Another example may be skip lists – a data structure that relies on the links not only between adjacent occurring records but also skipping adjacencies usually by an exponent of two. Such techniques improve lookup because they resist from having to visit each element one after the other.
#codingexercise

Find count of common elements between two arrays before a mismatch

Int GetCountMoves(List<int> A, List<int> B)

{

Assert (A!= null);

Assert (B!=nul)

Assert (A.Count == B.Count);

A.sort();

B.sort();

Int result = 0;

For (int I = 0; I < A.Count; I++)

{

If (A[I] == B[I]) result++;

Else break;

}

Return result;

}

Cluster computing

Friday, July 6, 2018

No comments:

Post a Comment