Cluster computing

Thursday, October 26, 2017

Yesterday we were discussing how user activities are logged for insight by MongoDB.
All user activity is recorded using HVDF API which is staged in User History store of MongoDB and pulled by external analytics such as Hadoop using MongoDB-Hadoop connector. Internal analytics for aggregation is also provided by Hadoop. Data store for Product Map, User preferences, Recommendations and Trends then store and make the aggregations available for personalization that the apps can use for interacting with the customer.
Today we look at monitoring and scaling in MongoDB detail. The user-activity-analysis-insight design works well for any kind of deployments. For example, we can have a local deployment, or an AWS deployment or a remote deployment of the database and they can be scaled from standalone to a replica set to a sharded cluster and in all these cases, the querying does not suffer. A replica set is group of MongoDB instances that host the same data set. It just provides redundancy and high availability. A sharded cluster stores data across multiple machines. When the data sets are large and the throughput is high, sharding eases the load. Tools to help monitor instances and deployments include Mongo Monitoring Service, MongoStat, Mongotop, IOStat and plugins for popular frameworks. These help troubleshoot failures immediately. The observation here is that MongoDB enables comprehensive monitoring through its own services, API and user interface. Being a database, the data from monitoring may also be made available for external query and dashboards such as Grafana stack. Some of the popular metrics involved here include data size versus disk size, active set size versus RAM size, disk I/O and write lock. The goal for the metrics and monitoring is that they should be able to account for the highest possible traffic.
MongoDB recommends that we add replicas to reduce latency to users, add read capacity and increase data safety. If the read does not scale, the data may potentially get stale. Adding or removing replica may be seamless.
#codingexercise
Find if one binary tree is a subtree of another
boolean IsSubTree (Node root1, Node root2)
{
Var original = new List <Node>();
Var sequence = new List <Node>();
Inorder (root1, ref original);
Inorder (root2, ref sequence);
return original.Intersect (sequence).equals (sequence);
}

Cluster computing

Thursday, October 26, 2017

No comments:

Post a Comment