Cluster computing

Friday, December 1, 2017

The argument for standard query operators.

Recently I came across a mindset among the folks of a company that databases are bad and services are good. There was not much difference between the two when we don't worry about the syntax of the query and we view the results as an enumerable. Even popular relational databases are hosted as a service with programmability features so you can leverage them in your code. With the introduction of microservices, it became easy to host not only a dedicated database but also a dedicated database server instance. Use microservices with Mesos based clusters and shared volumes, we now have many copies of the server for high availability and failover. This is possibly great for small and segregated data but larger companies often require massive investments in their data, often standardizing tools, processes and workflows to better manage their data. In such cases consumers of the data don't talk to the database directly but via a service that sits behind say even a message bus. If the consumers proliferate, they end up creating and sharing many different instances of services for the same data each with its own view rather than the actual table. APIs for these services are more domain based rather than implementing a query friendly interface that lets you directly work with the data. As services are organized, data may get translated or massaged as it makes its way from one to another. I have seen several forms of organizing the services starting with service-oriented architecture at the enterprise level to fine grained individual microservices. It is possible to have a bouquet of microservices that can take care of most data processing for the business requirements. Data may even be at most one or two fields of an entity along with its identifier for such services. This works very well to alleviate the onus and rigidity that comes with organization, the interactions between the components and the various chores that need to be taken to keep it flexible to suit changing business needs. The flat ring of services on the other hand are already business friendly to begin with letting services do their work. The graph of service dependencies may get heavily connected but at least it becomes better understood with very little stickiness that comes with ownership of data. Therefore, a vast majority of services may now be decoupled from any data ownership considerations and those that do may find it convenient to not remain database specific and can even forma chain if necessary.

Enterprise architects strive to lay the rules for different services but most are all the more willing to embrace their company's initiatives including investments in the cloud or making it more consistent with the others. Unless a team specifically asks for a one-off treatment by way of non-traditional databases or special requirements, they are all the more excited to use cookie cutters or corral the processing to a service. Instead if these same architects were to also take on the responsibility to open up some services with APIs implementing standard query operators on their data akin to what a well-known managed language does or what web developers practice with their REST API using standard query parameters, they will do away with much of the case by case needs that come their way. In essence, promoting standard query operators for data over and on top of business interactions with the service seems a win-win for everyone.

#codingexercise
Yesterday we were given three sorted arrays and we were finding one element from each array such that the element is closest to the given element. The elements were one each from each of the arrays.
Now if we wanted to find one element from each array such that they are closest to each other, we can reuse the GetClosest methods earlier in iteration for every element of one of the array until the criteria is satisfied We check the absolute value of the difference to the candidate value. Alternatively, we could also traverse all three arrays while keeping track of maximum and minimum difference encountered with the candidate set of three elements. We traverse by choosing one element at a time in any one array by incrementing the index of the array with the minimum value.

By advancing only the minimum element, we make sure the sweep is progressive and exhaustive.

Cluster computing

Friday, December 1, 2017

No comments:

Post a Comment