Cluster computing

Monday, May 28, 2018

We were discussing the use of a table and standard query operators to allowing developers to expose resources for the users to query themselves. They can pass the filter criteria directly in the url query parameters via one or more of the well known url patterns. The use of a table means we can also add rows and columns to increase data set and attributes respectively. We could make the resources even more specific by having more than one column in a composite key.
The use of a cloud database to store the table only improves its appeal because the database service becomes managed while being available from all geographical regions with high availability for large dataset. The only focus for the developers that remains in this case, is the application optimization.
The notions of a Big Table in a multi-model database and Big Query are also evidence to the popularity of this simpler paradigm. Let us take a moment to review these concepts.
Big Table is a Google offering for the last decade and more and is a NoSQL database where the latency for data access is kept low even in the face of petabytes of data and millions of operations per second. Data is retrieved using scan operations. It is read and written under 10 milliseconds. The limits for best practice include 4KB per key for data keys, about 100 families per table, 16KB qualifier per column, 10MB per cell, and 100MB for all values in a row. BigTable is known to power Analytics, Maps and GMail.
BigQuery is Google's data warehouse offering and is less than a decade old. It can store terabytes of data and allows queries to be written in SQL. It can power a wide variety of functionalities for an analytics dashboard. It supports relational database model as primary and key-value store as secondary and with append-only tables. It can query large amounts of data for analysis in less time but requires more time to query small specific transactional data. Query execution time can be in the order of seconds. Big query has two forms of costs - storage cost and query cost.
The choice of storage could be driven by the following rules:
if (your_data_is_structured &&
your_workload_is_analytics &&
you_need_updates_or_low_latency) use BigTable;
if (your_data_is_structured &&
your_workload_is_analytics &&
you_do_not_need_updates_or_low_latency) use BigQuery;
if (your_data_is_structured &&
your_workload_is_not_analytics &&
your_data_is_relational &&
you_need_horizontal_scalability) use Cloud Spanner;
if (your_data_is_structured &&
your_workload_is_not_analytics &&
your_data_is_relational &&
you_do_not_need_horizontal_scalability) use Cloud SQL;

Cluster computing

Monday, May 28, 2018

No comments:

Post a Comment