Cluster computing

Monday, December 4, 2017

We were discussing the argument for standard query operators here. Standard Query operators have an interesting attribute. They are often used in Lambda expressions. Lambda expressions are local functions that can be passed around and returned as any first-class objects of the programming world. Software developers love lambdas because it makes expressions of logic so much more fun and convenient. They use it to create delegates and expression trees. The former treats logic as a data type and the latter represents logic in a tree like data structure where each node is an expression. Both can be compiled and run and even allow dynamic modification of executable code.

When data is processed by logic written in lambdas they are predictable, consistent and free from resource onus. Considerations such as where the logic is deployed, how it is hosted, the topology of the resources and the chores that go with maintaining it are no longer weighing on the developer. Even the business owners appreciate when their logic is no longer held ransom to technology, infrastructure or vendors. Furthermore, with the organization or re-organization of code by way of delegates and expression trees in addition to and not excluding the well accepted object-oriented programming concepts, the business is now able to move swiftly to the market to deliver new and better proposals.

It is also important to observe here that data technologies and processing are never in vacuum. They work within the parameters of their ecosystem by way of feedback cycle. Lambda expressions have lately demonstrated acceptance even in the cloud computing world in the form of serverless computing.

In a database, the query optimizer something very similar to this. It decides the set of tree transformations that are applied to the abstract syntax tree after it has been annotated and bound. These transformations include choosing join order, rewriting selects etc. It not only chooses the order in which the operations are applied but it also introduces new operators. Therefore it seems a good candidate for what we have described above.

It’s important to note the distinction between database and serverless this way. One tries to push the computations as close to the data as possible while the other tried to distance the computations from the resources used. Traditionally databases have required to scale up at least for online processing while the alternatives have gone for scale out via batch oriented. Databases have required large memory and continued to take up as much memory as added. The alternatives have distributed the computations via commodity servers

#codingexercise

Yesterday we were given three sorted arrays and we wanted to find one element from each array such that they are closest to each other. One of the ways to do this was explained this way: We could also traverse all three arrays while keeping track of maximum and minimum difference encountered with the candidate set of three elements. We traverse by choosing one element at a time in any one array by incrementing the index of the array with the minimum value.

By advancing only the minimum element, we make sure the sweep is progressive and exhaustive.

We don't have to sweep for mode that spans all three arrays because we are guaranteed that the maximum and minimum difference between the three identical elements will be zero. For every such occurrence, we then count the number of identical elements in each array.

Cluster computing

Monday, December 4, 2017

No comments:

Post a Comment