Cluster computing

Sunday, October 22, 2017

We were discussing MongoDB. Their catalog initially stored 1 document per SKU per store. if there are hundred million items in a thousand store, it results in a hundred billion entries. Instead the documents were now one each for a specific key/store grouping SKU. This improves geo distribution and results in lower number of docs. In order to keep reads/writes local with low latency in an architecture where writes go to every shard for every region, they were streamlined to go to their respective regions with the help of tags.
Even the shopping cart is modeled as a single document. In order to improve availability, inventory was shared with regions. The inventory was stored with geocode and there was option to make it more granular by using store Is instead of geocode. The shopping cart is also made more available by replication across regions so that the writes are local even as the shopping cart moves between regions. Each shard has one replica per region. The primary servers are distributed per region. The shopping carts are also shared by geocodes or optionally by storeid
User activities are logged for insight. These include search terms, items viewed or wished, cart added or removed. orders submitted, sharing on social network and as impression or clickstream. Data will be used to compute user / product history product map, user preferences, recommendations or trends.
Insights from logs had proven difficult. Log files are usually forgotten. moreover the user activity is voluminous When the activity is recorded in data warehouse system, it provides good reporting but expensive to scale. NoSQL helps good scaling and reporting but there is still no data store for real time queries and aggregations from applications. Moreover there are delays in moving logs to the log processor and delays in processing data from the log processor to the analytics.from the analytics, the output is limited by schema before it is put in a relational store. From this store there is limited read capability by the application. Instead of using a time series database, the activity store and reporting is improved right from the box by MongoDB.
#codingexercise
Yesterday we talked about finding the largest number from the digits whose product equals the number.
for a single number as a standalone we determined the answer using digits from 2 to 9 and their repetition while omitting 1 by using dynamic programming
when we have an increasing continuous sequence of inputs, we said we could re-use the results from the factors. specifically we order the factors by their counts and use the ones that maximize the count. The combinations of factors that result in the product need not all be enumerated. we just take as many of those with higher counts as necessary to make the product.
another way to think about this is to use the same dynamic programming as for the digits by recursively checking if the factor inclusion or exclusion returns higher count of digits. we bail if the product is reached and return 1 or not met and return 0. for inclusion we add the contribution from the factor to the recursionand for exclusion we return the recursion.

#implementations of K-means: https://1drv.ms/w/s!Ashlm-Nw-wnWsWbQF0tucrJAUlGz

Cluster computing

Sunday, October 22, 2017

No comments:

Post a Comment