Cluster computing

Saturday, October 21, 2017

MongoDB reported that the traditional relational store for a catalog suffered from the following issues:
1) The field inventory is a local view only until it makes its way to the central store.
2) The relational store involves a one-a-day sync or something periodic
3) Stale views are served until the refresh happens which is often not fast enough for consumers.
4) The stale view interferes with analytics and aggregations reports.
5) Downstream internal and external apps have to work around the delays and stale views with sub-optimal logic.

MongoDB overcomes all these and organizes the catalog into Item, Variant, Price, Hierarchy, Facet and Vendors in a document store. Then it allows the applications to search for data via prejoined objects in cache or via indexing through search engine using Lucene/Solr architecture
Applications generally want a central service with a single view to the catalog. It is used by most services and back channels. Moreover, they want this inventory to be in sync in real time. Therefore they expect their writes to be local and real-time in a read dominated workload. That said refresh is in the form of bulk writes. via point in time loads. Bulk inserts work well with relational database. Therefore MongoDBs document data store gets synced with the relational data store periodically say nightly. The real-time update and view all happen on the document store only and don't reach the relational store. Similarly analytics and aggregation pull data from the document store
The catalog initially stored 1 document per SKU per store. if there are hundred million items in a thousand store, it results in a hundred billion entries. Instead the documents were now one each for a specific key/store grouping SKU. This improves geo distribution and results in lower number of docs. In order to keep reads/writes local with low latency in an architecture where writes go to every shard for every region, they were streamlined to go to their respective regions with the help of tags.
Even the shopping cart is modeled as a single document.

#codingexercise
We talked about finding the largest number from the digits whose product equals the number.
We said we can try different digits from 2 to 9 in increasing order to find the divisors of the numbers and their repetitions and we take as many of the divisors as possible. In increasing order of digits, this translates to (digit1, repetition1), (digit2, repetition2) ... pairs. The ones with the higher repetitions are included in the final solution.
Each digit, repetition pair is either included in the final solution or not. Consequently we can write a dynamic programming that recursively tries both paths and returns the count of the sub-problem plus the repetitions from the current choice or merely the count from the sub-problem as the case maybe.

An essay on personal assistant: https://1drv.ms/w/s!Ashlm-Nw-wnWsUKrRyxwuyYPFiz5

Cluster computing

Saturday, October 21, 2017

No comments:

Post a Comment