Cluster computing

Wednesday, October 25, 2017

We were discussing how user activities are logged for insight by MongoDB.
All user activity is recorded using HVDF API which is staged in User History store of MongoDB and pulled by external analytics such as Hadoop using MongoDB-Hadoop connector. Internal analytics for aggregation is also provided by Hadoop. Data store for Product Map, User preferences, Recommendations and Trends then store and make the aggregations available for personalization that the apps can use for interacting with the customer.
Thus MongoDB powers applications for products and inventory, recommended products, customer profile and session management. Hadoop powers analysis for elastic pricing, recommendation models, predictive analytics and clickstream history. The user activity model is a json document with attributes for geoCode, sessionId, device, userId, type of activity, itemId, sku, order, location, tags, and timestamp. Recent activity for a user is a simple query as db.activity.find({userId:"123"}).sort({time: -1}).limit(1000)
Indices that can be used include userId+time, itemId+time, time.
Aggregations are very fast and in real time. Queries like finding the recent number of views for a user, the total sales for a user, the number of views/purchases for a item are now near real-time.
A batch query over NoSQL such as a map-reduce calculation for unique visitors is also performant.
This design works well for any kind of deployments. For example, we can have a local deployment, or an AWS deployment or a remote deployment of the database and they can be scaled from standalone to a replica set to a sharded cluster and in all these cases, the querying does not suffer. A replica set is group of MongoDB instances that host the same data set. It just provides redundancy and high availability. A sharded cluster stores data across multiple machines. When the data sets are large and the throughput is high, sharding eases the load. Tools to help monitor instances and deployments include Mongo Monitoring Service, MongoStat, Mongotop, IOStat and plugins for popular frameworks. These help troubleshoot failures immediately.
In the second iteration we can even compare the distance and exit when a coprime is located in the first iteration and the distance for that coprime is more than the current distance.
int dist = 0;
// First Iteration as described yesterday
// Second Iteration as described yesterday
// within second iteration
if (dist > 0 && math.abs(n-i) < dist) {
break;
}

Cluster computing

Wednesday, October 25, 2017

No comments:

Post a Comment