Cluster computing

Friday, June 1, 2018

We mentioned BigTable and BigQuery in earlier posts where we discussed the purpose of each as forms of storage and processing. It may be interesting to note that for a developer a table and query represent common notions of processing. It is the equivalent of data structure and algorithms or storage and compute at any level of software stack. The only refinement is that the the table represents transactional operations for entries which are separated from the read only analytical processing some of which may involve heavy aggregations. These are performed independent of each other and consequently they do not interfere with each other. One of the advantages of this separation is that the analytical costs are taken by the users of the query. But that is not all. The query parameters can be requested all the way from the end users of the system even via webAPIs. This mean that the logic can be used over and over again for different callers. New queries can be written when earlier don't suffice. It becomes cheap to write queries to the desired results without affecting any of the existing data gathering operations. The ability to expose query parameters and logic to the end users results in new use cases that the system does not need to know about or have to implement while providing as detailed information to those users as possible.
The use of cloud databases and their abilities significantly improves the appeal for the storage of data because there are no limits while providing the guarantees of a database. In addition, the developer notion of storage gets as simplified as possible with all the associated routines maintained as much as possible.
Many systems cannot afford the luxury of the cloud databases because there are too low-level in the system architecture or they cannot afford the tools and processes surrounding data handling for a database. In such cases, we can consider the notion of a table as a data source and use alternate forms of storage. These data sources are essentially a collection and we need only as much guarantees around the collection as the producer and the consumer of the data want. There is no limitation to the storage as long as we can separate the processing of the data in read only operations.
Finally, there is a universal appeal to keeping a collection of entries in that it becomes easy to implement logic in the form of standard query operators with its methods such as .Where() and .Sum() for predicates and aggregations. The acceptance of these operators across stacks and languages indicate that there are common expressions of querying logic which enables us to automatically use the most flexible form for future growth.
#codingexercise
We discussed yesterday a technique to partition the array into the most nearly balanced sub-arrays as possible by making two passes on the array
Another way to do this is to evaluate each candidate position of the split to see if we can get the best result.

Cluster computing

Friday, June 1, 2018

No comments:

Post a Comment