Cluster computing: October 2017

Tuesday, October 31, 2017

We were discussing Master Data Management. Some of the top players in this space include companies such as Informatica, IBM Infosphere, Microsoft, SAP Master and Riversand. Informatica offers an end to end MDM solution with an ecosystem of applications. It does not require the catalog to be in a single domain. Infosphere has been a long player and its product is considered mature with more power for collaborative and operational capabilities. It plays well with other IBM solutions and their ecosystem. SAP consolidates governance over the master data with emphasis on data quality and consistency. It supports workflows that are collaborative and is noted for supplier side features such as supplier onboarding. Microsoft Data services that includes the SQL Server makes it easy to create master lists of data with the benefit that the data is made reliable and centralized so that it can participate in intelligent analysis. Most products require changes to existing workflows to some degree to enable customer to make the transition.

#codingexercise

Two of the nodes of a BST are swapped. Correct the BST:

Node CorrectBST(Node root)

{

If (root == null) return null;

Var inorder = new List<Node>();

InOrder(root, ref inorder);

Var swapped = FindSwappedAsTupleFromSequence(inorder);

// swap the data of the two nodes.

Var temp = Swapped.first.data;

Swapped.first.data = swapped.second.data;

Swapped.second.data = temp;

return root;

}

Monday, October 30, 2017

We were discussing storing the product catalog in the cloud. MongoDB provides cloud scale deployment. Workflows to get data into MongoDB however required organizations to come up with Extract-Transform-Load operations and did not deal with json natively. Cloud scale deployment of MongoDB let us have the benefit of relational as well as document stores. Catalogs are indeed considered a collection of documents which are reclassified or elaborately named and tagged items. These digital assets also need to be served via browse and search operations which differ in their queries. Cloud scale deployments come with extensive monitoring support. This kind of monitoring specifically look for data size versus disk size, active set size versus ram size, disk IO, write lock and generally accounts for and tests the highest possible traffic. Replicas are added to remove latency and increase read capacity.

#codingexercise
we were discussing how to find whether a given tree is a sub tree of another tree. We gave a recursive solution and an iterative solution. The recursive solution did not address the case when the subtree is not a leaf in the original tree. Since we know that the original and subtree are distinct we can improve the recursive solution by adding a condition that returns true if the subtree node is null and the original is not.
We could also make this more lenient by admitting mirror trees as well. a mirror tree is one where left and right may be swapped.

4
2 6
1 35 7
InOrder : 1 2 3 4 5 6 7
bool isEqual(Node root1, Node root2)
{
if (root1 == NULL && root2 == NULL) return true;
if (root1 == NULL) return false;
if (root2 == NULL) return true; // modified
return root1.data == root2.data && ((isEqual(root1.left, root2.left) && isEqual(root1.right, root2.right)) || (isEqual(root1.left, root2.right) && isEqual(root1.right, root2.left)));
// as opposed to return root1.data == root2.data && isEqual(root1.left, root2.left) && isEqual(root1.right, root2.right);
}

Sunday, October 29, 2017

We were discussing Master Data management and how it differs between MongoDB and traditional databases. MongoDB provides two important functionalities over catalog namely browsing and searching. Both these services seem that they could do with improvements. While browsing based service could expand on the standard query operator capabilities including direct SQL invocations, the search capabilities could be made similar to that of Splunk which can query events using search expressions and operators that have the same look and feel as customary tools on Unix. These two distinct services are provided over a single managed view of the catalog. The challenge with browsing over catalog unlike other datasets is that the data is quite large and difficult to navigate. Traditional REST APIs solve this with the use of a few query parameters such as page, offset, search-term and leaving a cursor like logic to the callers. Instead I propose to push the standard query operators to service itself.
The browsing service has to be optimized for the mobile experience as compared to the desktop experience. These optimizations include smaller page size, reorganized page layouts, smaller resources, smaller images and bigger thumbnails, mobile optimized styling, faster page load times and judicious use of properties on the page.
Since much of the content for catalog remain static resources, web proxies, gro sharding, content delivery networks, web proxies and app fabric like caches are immensely popular. However, none of these are really required if the data store is indeed cloud based. For example the service level agreement from a cloud database is the same regardless of the region from where the query is issued.
#codingexercise
we were discussing how to find whether a given tree is a sub tree of another tree. We gave a recursive solution and an iterative solution. The recursive solution did not address the case when the subtree is not a leaf in the original tree. Since we know that the original and subtree are distinct we can improve the recursive solution by adding a condition that returns true if the subtree node is null and the original is not.

4
2 6
1 35 7
InOrder : 1 2 3 4 5 6 7
bool isEqual(Node root1, Node root2)
{
if (root1 == NULL && root2 == NULL) return true;
if (root1 == NULL) return false;
if (root2 == NULL) return true; // modified
return root1.data == root2.data && isEqual(root1.left, root2.left) && isEqual(root1.right, root2.right);
}

Saturday, October 28, 2017

We were discussing Master Data management and how it differs between MongoDB and traditional databases.
Master Data Management participates in:
Planning and architecture
Execution and deployment
Quality and monitoring
Governance and Conrol
Standards and Compliance

Due to the lagging embrace of cloud based master data management technologies ( with the possible exception of Snowflake due to its architecture ), Riversand still continues to enjoy widespread popularity in an on-premise solution.
Riversand offers data modeling, data synchronization, data standardization, and flexible workflows within the tool. It offers scalability, performance and availability based on its comprehensive Product Information Management (PIM) web services. Functionality is provided in layers of information management starting with print/translation workflows at the bottom layer, followed by workflow or security for access to the assets, their editing, insertions and bulk insertions, followed by integrations or portals followed by flexible integration capabilities, full/data exports, multiple exports, followed by integration portals for integration with imports/exports, data pools and platforms, followed by digital asset management layer for asset on-boarding and delivery to channels, and lastly data management for searches, saved searches, channel based content, or localized content and the ability to author variants, categories, attributes and relationships to stored assets.

#codingexercise
Find if one binary tree is a subtree of another
We were discussing recursive solution and why it does not apply when the subtree is not a leaf in the original. There is also an iterative solution that can be modified to suit this purpose.
bool IsEqualIterative(Node root1, Node root2, ref Stack<Node> stk)
{
var current1 = root1;
var current2 = root2;

while (current1 != null ||  current2 != null || stk. empty() == false)
{

if (current1 == null && current2 != null ) return false;
if (current1 != null && current2 == null ) return false;

if (current1 != null)
{
if (current1.data != current2.data) return false;
stk.push(current1);
current1 = current1.left;
current2 = current2.left;
continue;
}

if (stk. empty() == false)
{
// we can introduce sequence equal-logic-so-far here
current1 = stk. pop();
current1 = current1.right;
current2 = current2.right;
}

}
return true;
}

Friday, October 27, 2017

We have been discussing MongoDB improvements in catalog management. Let us recap quickly the initiatives taken by MongoDB and how it is different from a regular Master Data Management. Specifically, we can compare it with MDM from Riversand which we discussed earlier:
Comparisions
MongoDB:
- organizes Catalog as a one stop shop in its store
-- no sub-catalogs or fragmentation or ETL or MessageBus
- equally available to Application servers, API data and services and webservers.
- also available behind the store for supply chain management and data warehouse analytics
- catalog available for browsing as well as searching via Lucene search index
- geo-sharding with persisted shard ids or more granular store ids for improving high availability and horizontal scalability
- local real-time writes and tuned for read-dominated workload
- bulk writes for refresh
- RelationalDB stores point in time loads while overnight they are pushed to catalog information management and made available for real-time views
- NoSQL powers insight and analytics based on aggregations
- provides front-end data store for real time queries and aggregations from applications.
- comes with incredible monitoring and scaling

Both MDM and MongoDB Catalog store hierarchy and facet in addition to Item and SKUs as a way of organizing items.

Riversand MDM
- rebuildable catalog via change data capture
- .Net powered comprehensive web services
- data as a service model
- Reliance on traditional relational databases only

#codingexercise
Find if one binary tree is a subtree of another
boolean IsSubTree (Node root1, Node root2)
{
Var original = new List <Node>();
Var sequence = new List <Node>();
Inorder (root1, ref original);
Inorder (root2, ref sequence);
return original.Intersect (sequence).equals (sequence);
}
4
2 6
1 35 7
InOrder : 1 2 3 4 5 6 7
Caveat this works for 2,3,4 but for 2,4,6 we need the sequence equals method on the intersection.
bool isEqual(Node root1, Node root2)
{
if (root1 == NULL && root2 == NULL) return true;
if (root1 == NULL) return false;
if (root2 == NULL) return false;
return root1.data == root2.data && isEqual(root1.left, root2.left) && isEqual(root1.right, root2.right);
}
Note that we cannot apply the isEqual solution as the isSubtree alternative because the subtree may not be a leaf in the original.

Thursday, October 26, 2017

Yesterday we were discussing how user activities are logged for insight by MongoDB.
All user activity is recorded using HVDF API which is staged in User History store of MongoDB and pulled by external analytics such as Hadoop using MongoDB-Hadoop connector. Internal analytics for aggregation is also provided by Hadoop. Data store for Product Map, User preferences, Recommendations and Trends then store and make the aggregations available for personalization that the apps can use for interacting with the customer.
Today we look at monitoring and scaling in MongoDB detail. The user-activity-analysis-insight design works well for any kind of deployments. For example, we can have a local deployment, or an AWS deployment or a remote deployment of the database and they can be scaled from standalone to a replica set to a sharded cluster and in all these cases, the querying does not suffer. A replica set is group of MongoDB instances that host the same data set. It just provides redundancy and high availability. A sharded cluster stores data across multiple machines. When the data sets are large and the throughput is high, sharding eases the load. Tools to help monitor instances and deployments include Mongo Monitoring Service, MongoStat, Mongotop, IOStat and plugins for popular frameworks. These help troubleshoot failures immediately. The observation here is that MongoDB enables comprehensive monitoring through its own services, API and user interface. Being a database, the data from monitoring may also be made available for external query and dashboards such as Grafana stack. Some of the popular metrics involved here include data size versus disk size, active set size versus RAM size, disk I/O and write lock. The goal for the metrics and monitoring is that they should be able to account for the highest possible traffic.
MongoDB recommends that we add replicas to reduce latency to users, add read capacity and increase data safety. If the read does not scale, the data may potentially get stale. Adding or removing replica may be seamless.
#codingexercise
Find if one binary tree is a subtree of another
boolean IsSubTree (Node root1, Node root2)
{
Var original = new List <Node>();
Var sequence = new List <Node>();
Inorder (root1, ref original);
Inorder (root2, ref sequence);
return original.Intersect (sequence).equals (sequence);
}

Wednesday, October 25, 2017

We were discussing how user activities are logged for insight by MongoDB.
All user activity is recorded using HVDF API which is staged in User History store of MongoDB and pulled by external analytics such as Hadoop using MongoDB-Hadoop connector. Internal analytics for aggregation is also provided by Hadoop. Data store for Product Map, User preferences, Recommendations and Trends then store and make the aggregations available for personalization that the apps can use for interacting with the customer.
Thus MongoDB powers applications for products and inventory, recommended products, customer profile and session management. Hadoop powers analysis for elastic pricing, recommendation models, predictive analytics and clickstream history. The user activity model is a json document with attributes for geoCode, sessionId, device, userId, type of activity, itemId, sku, order, location, tags, and timestamp. Recent activity for a user is a simple query as db.activity.find({userId:"123"}).sort({time: -1}).limit(1000)
Indices that can be used include userId+time, itemId+time, time.
Aggregations are very fast and in real time. Queries like finding the recent number of views for a user, the total sales for a user, the number of views/purchases for a item are now near real-time.
A batch query over NoSQL such as a map-reduce calculation for unique visitors is also performant.
This design works well for any kind of deployments. For example, we can have a local deployment, or an AWS deployment or a remote deployment of the database and they can be scaled from standalone to a replica set to a sharded cluster and in all these cases, the querying does not suffer. A replica set is group of MongoDB instances that host the same data set. It just provides redundancy and high availability. A sharded cluster stores data across multiple machines. When the data sets are large and the throughput is high, sharding eases the load. Tools to help monitor instances and deployments include Mongo Monitoring Service, MongoStat, Mongotop, IOStat and plugins for popular frameworks. These help troubleshoot failures immediately.
In the second iteration we can even compare the distance and exit when a coprime is located in the first iteration and the distance for that coprime is more than the current distance.
int dist = 0;
// First Iteration as described yesterday
// Second Iteration as described yesterday
// within second iteration
if (dist > 0 && math.abs(n-i) < dist) {
break;
}

Monday, October 23, 2017

We were discussing how user activities are logged for insight by MongoDB. These activities include search terms, items viewed or wished, cart added or removed. orders submitted, sharing on social network and as impression or clickstream. Instead of using a time series database, the activity store and reporting is improved right from the box by MongoDB Moreover the user activities is used to compute user/product history, product map, user preferences, recommendations and trends. The challenges associated with recording user activities include the following
1) they are too voluminous and usually available only in logs
2) when they are made available elsewhere, they hamper performance.
3) If a warehouse is provided to record user activities, it helps reporting but becomes costly to scale
4) a compromise is available in NoSQL stores
5) but it is harder to provide a NoSQL store at the front-end where real-time queries are performed.
MongoDB addresses these with
1) providing a dedicated store for large stream of data samples with variable schema and controlling the retention period
2) computing aggregations and derivatives separately
3) providing low latency to update data.
All user activity is recorded using HVDF API which is staged in User History store of MongoDB and pulled from external analytics such as Hadoop using MongoDB-Hadoop connector. Internal analytics for aggregation is also provided by Hadoop. Data store for Product Map, User preferences, Recommendations and Trends then store and make the aggregations available for personalization that the apps can use for interacting with the customer.
This completes the Data->Insight->Actions translation for user activities.

#codingexercise
We were implementing the furthest coprime of a given number such that it is in the range 2 – 250.
We made a single pass over the range of 2 to 250 to determine which ones were co-prime.
However we only need the farthest. Consequently we could split the range to be above the number or below the number so long as the input number is in the middle of the range and we could bail at the first encountered co-prime from either extremes. In the second iteration we can even compare the distance and exit when a coprime is located in the first iteration and the distance for that coprime is more than the current distance
int dist = 0;
// First Iteration as described yesterday
// Second Iteration as described yesterday
// within second iteration
if (dist > 0 && math.abs(n-i) < dist) {
break;
}

Sunday, October 22, 2017

We were discussing MongoDB. Their catalog initially stored 1 document per SKU per store. if there are hundred million items in a thousand store, it results in a hundred billion entries. Instead the documents were now one each for a specific key/store grouping SKU. This improves geo distribution and results in lower number of docs. In order to keep reads/writes local with low latency in an architecture where writes go to every shard for every region, they were streamlined to go to their respective regions with the help of tags.
Even the shopping cart is modeled as a single document. In order to improve availability, inventory was shared with regions. The inventory was stored with geocode and there was option to make it more granular by using store Is instead of geocode. The shopping cart is also made more available by replication across regions so that the writes are local even as the shopping cart moves between regions. Each shard has one replica per region. The primary servers are distributed per region. The shopping carts are also shared by geocodes or optionally by storeid
User activities are logged for insight. These include search terms, items viewed or wished, cart added or removed. orders submitted, sharing on social network and as impression or clickstream. Data will be used to compute user / product history product map, user preferences, recommendations or trends.
Insights from logs had proven difficult. Log files are usually forgotten. moreover the user activity is voluminous When the activity is recorded in data warehouse system, it provides good reporting but expensive to scale. NoSQL helps good scaling and reporting but there is still no data store for real time queries and aggregations from applications. Moreover there are delays in moving logs to the log processor and delays in processing data from the log processor to the analytics.from the analytics, the output is limited by schema before it is put in a relational store. From this store there is limited read capability by the application. Instead of using a time series database, the activity store and reporting is improved right from the box by MongoDB.
#codingexercise
Yesterday we talked about finding the largest number from the digits whose product equals the number.
for a single number as a standalone we determined the answer using digits from 2 to 9 and their repetition while omitting 1 by using dynamic programming
when we have an increasing continuous sequence of inputs, we said we could re-use the results from the factors. specifically we order the factors by their counts and use the ones that maximize the count. The combinations of factors that result in the product need not all be enumerated. we just take as many of those with higher counts as necessary to make the product.
another way to think about this is to use the same dynamic programming as for the digits by recursively checking if the factor inclusion or exclusion returns higher count of digits. we bail if the product is reached and return 1 or not met and return 0. for inclusion we add the contribution from the factor to the recursionand for exclusion we return the recursion.

#implementations of K-means: https://1drv.ms/w/s!Ashlm-Nw-wnWsWbQF0tucrJAUlGz

Saturday, October 21, 2017

MongoDB reported that the traditional relational store for a catalog suffered from the following issues:
1) The field inventory is a local view only until it makes its way to the central store.
2) The relational store involves a one-a-day sync or something periodic
3) Stale views are served until the refresh happens which is often not fast enough for consumers.
4) The stale view interferes with analytics and aggregations reports.
5) Downstream internal and external apps have to work around the delays and stale views with sub-optimal logic.

MongoDB overcomes all these and organizes the catalog into Item, Variant, Price, Hierarchy, Facet and Vendors in a document store. Then it allows the applications to search for data via prejoined objects in cache or via indexing through search engine using Lucene/Solr architecture
Applications generally want a central service with a single view to the catalog. It is used by most services and back channels. Moreover, they want this inventory to be in sync in real time. Therefore they expect their writes to be local and real-time in a read dominated workload. That said refresh is in the form of bulk writes. via point in time loads. Bulk inserts work well with relational database. Therefore MongoDBs document data store gets synced with the relational data store periodically say nightly. The real-time update and view all happen on the document store only and don't reach the relational store. Similarly analytics and aggregation pull data from the document store
The catalog initially stored 1 document per SKU per store. if there are hundred million items in a thousand store, it results in a hundred billion entries. Instead the documents were now one each for a specific key/store grouping SKU. This improves geo distribution and results in lower number of docs. In order to keep reads/writes local with low latency in an architecture where writes go to every shard for every region, they were streamlined to go to their respective regions with the help of tags.
Even the shopping cart is modeled as a single document.

#codingexercise
We talked about finding the largest number from the digits whose product equals the number.
We said we can try different digits from 2 to 9 in increasing order to find the divisors of the numbers and their repetitions and we take as many of the divisors as possible. In increasing order of digits, this translates to (digit1, repetition1), (digit2, repetition2) ... pairs. The ones with the higher repetitions are included in the final solution.
Each digit, repetition pair is either included in the final solution or not. Consequently we can write a dynamic programming that recursively tries both paths and returns the count of the sub-problem plus the repetitions from the current choice or merely the count from the sub-problem as the case maybe.

An essay on personal assistant: https://1drv.ms/w/s!Ashlm-Nw-wnWsUKrRyxwuyYPFiz5

Friday, October 20, 2017

We started looking at storing catalogs in MongoDB. The catalog often gets fragmented, processes become redundant, ETL and message bus start getting added which lead to inefficiencies. MongoDB overcomes this with a single unified view and one central service that provides not only the catalog but also a search engine over the catalog. In this way, the catalog becomes one central service with flexible schema, high read volume, write spike tolerance during catalog update, and comes with advanced indexing and querying as well as geographical distribution for HA and low latency.
MongoDB reported that the traditional relational store for a catalog suffered from the following issues:
1) The field inventory is a local view only until it makes its way to the central store.
2) The relational store involves a one-a-day sync or something periodic
3) Stale views are served until the refresh happens which is often not fast enough for consumers.
4) The stale view interferes with analytics and aggregations reports.
5) Downstream internal and external apps have to work around the delays and stale views with sub-optimal logic.

In this sense, MongoDB tries to provide Real-Time Inventory which is used directly by most services and channels. It even provides local real-time writes while supporting a read dominated workload. More geographical distribution and horizontal scalability helps improve performance. #codingexercise

Yesterday we were discussing a method to find the smallest number k such that the product of digits of k is equal to n We said we could do this by finding the digits from 1 to 9 that can divide the number n and list them sorted in increasing order with repetitions such that the product can be formed. By trying this with the decreasing order of digits from 9 to 1 we get fewer digit based numbers first.
The same solution works similarly if we want to find the largest number k instead of the smallest number. In this case we try to maximize the number of digits in the final solution and therefore we order the digits to try from smallest to biggest.
Moreover, we can also extend this to consecutive integers or multiples by avoiding redundant calculations as before. For example, we could re-use the results from the factors computed earlier using memoization. Instead of trying the combinations of the factors to see which one gives the largest number, we can improve that by choosing an increasing order of factors and then working with their results so that we know we will get the largest number first. Since the count of digits is important over their value, we can use the results from the factors. For example,
100 = 4 x 25
100 = 4 x 5 x 5
100 = 10 x 10
100 = 2 x 5 x 2 x 5
While the individual digits can contribute at most 1, the factors can contribute multiple of digits. we can choose the factors that contribute the most number of digits and their repetitions in the final solution. In this case we can ignore answers comprising of digits and look up the memoization table for the factors. Here we could directly choose, the result as f(100) = f(2) x f(50).

Thursday, October 19, 2017

MongoDB provides all aspects of data storage for online retail stores. It can categorize data based on merchandising, content, inventory, customer, channel, sales and fulfilment, insight and social. Out of these merchanding, inventory, customer and insight are the most heavily used for peak holiday sales season. In addition, supply chain management system and data warehouse can also integrate well with this database.
We started looking at storing catalogs in MongoDB. The problem with catalog as with any master data management is that data becomes rigid. If the catalog is accessed more than 100 times per second, it causes issue with performance. When other departments need to access it, ETL, message bus and API services are put in place for this store. This results in fragmented data, redundant processes, heterogeneous data processing and higher costs both in terms of time and money. Even users begin to see degradation in page load times. To solve the local performance, more caches, more message bus and more ETL operations are added which only complicates it. A single view of the product and one central service becomes harder to achieve. The purpose of the catalog is to form a single view of the product with one central service, flexible schema, high read volume, write spike tolerant during catalog update, and to have advanced indexing and querying and geographical distribution for HA and low latency.
MongoDB is not a traditional MDM but it addresses all the above requirements. A traditional Master Data Management has a well known organization of catalog with support for hierarchy and dimensions. MongoDB organizes the catalog in the form of Items, Pricing, Promotions, Variants, and Ratings and Reviews. In Json this appears as nested fields and are pre-joined into objects. The objects live in the cache as well. A search engine provides search over the catalog. Functional data access is provided by the Product API. The API and the engine separately cover all operations on the catalog. The API can then be used by downstream units such as Online Store, Marketing, Inventory, SCMS and other APIs. The Search engine is built on Lucene/Solr Architecture A Lucene index keeps track of terms and their occurrence locations but the index needs to be rebuilt each time the catalog changes. The Product API can retrieve results directly from the catalog or via the search engine. In both cases, the customer only issues a single query.
#codingexercise

Yesterday we were discussing a method to find the smallest number k such that the product of digits of k is equal to n We said we could do this by finding the digits from 1 to 9 that can divide the number n and list them sorted in increasing order with repetitions such that the product can be formed. By trying this with the decreasing order of digits from 9 to 1 we get fewer digit based numbers first.
We extended this to say that if we try the above method repeatedly for consecutive integers or multiples, we do a lot of redundant calculations. Instead we could re-use the results from the factors computed earlier. However, we also said that we have to try the combinations of the factors to see which one gives the smallest numbers. We can improve that by choosing a decreasing order of factors and then working with their results so that we know we will get the smallest number first.
For example 100 = 20 * 5 and 100 = 10 * 10
We get the answer from the decreasing order of factors.

Wednesday, October 18, 2017

MongoDB provides all aspects of data storage for online retail stores. It can categorize data based on merchandising, content, inventory, customer, channel, sales and fulfilment, insight and social. Out of these merchanding, inventory, customer and insight are the most heavily used for peak holiday sales season. In addition, supply chain management system and data warehouse can also integrate well with this database.
We now lookup some of these functions in a table. For example, a catalog is important for the online retail store to allow users to browse the inventory. The problem with catalog as with any master data management is that data becomes rigid. If the catalog is accessed more than 100 times per second, it causes issue with performance. When other departments need to access it, ETL, message bus and API services are put in place for this store. This results in fragmented data, redundant processes, heterogeneous data processing and higher costs both in terms of time and money. Even users begin to see degradation in page load times. Another concern with the catalog is that there are many more catalogs formed as it grows. This leads to more caches, more message bus and more ETL operations. A single view of the product and one central service becomes harder to achieve. The purpose of the catalog is to form a single view of the product with one central service, flexible schema, high read volume, write spike tolerant during catalog update, and to have advanced indexing and querying and geographical distribution for HA and low latency.
#codingexercise
Find the smallest number k such that the product of digits of k is equal to n
For example, if the input is 100, then the output is 455
if the input is 26, then the output is -1

We know that the smallest number possible must have increasing order sorted digits with the smallest digit appearing in the most significant position. Therefore, starting from 9 to 1, we can divide the number repeatedly and keep track of the digit as well as its repetitions for those that are divisors. We stop when we have exhausted the digits we started with or the number is reduced to 1 or 0. We store it in a stack because the sequence of digits and their repetitions need to be reversed to form the answer. The termination conditions are when n is 1
int GetDivisors(int n)
{
if (n >= 0 && n <= 9) return n;
var divisors = new Stack<int>();
for ( int i = 9; i >=2 && n >1; i--)
{
while(n % i == 0)
{
divisors.push(i);
n = n / i;
}
}
if (n != 1) return -1;
return GetNumberFormedFromDivisors(divisors);
}
This method works well for a single input but if we have a contiguous sequence of inputs, we can utilize the results of the previous computations by memoizing the results and putting them in a dp table. The catch however will be to get all factors and see which combination is smallest.
For example
100 is 20 x 5
20 is already computed as 4 x 5

Tuesday, October 17, 2017

We were reviewing how MongoDB enables integration with social engineering applications while also enabling data and service integration for the user data inflow with APIs for data and services during peak annual sales season. At the heart of their store, they can categorize data based on merchandising, content, inventory, customer, channel, sales and fulfilment, insight and social. Out of these merchanding, inventory, customer and insight are the most heavily used for peak holiday sales season. In addition, supply chain management system and data warehouse also integrate well with this Storage.

The information layer improves experience with look and feel, Navigation, customization, personalization, branding, promotion, chats and ads. Each of these is a separate initiative. The customer's perspective helps us understand their context. For example, a user would like to browse and search the store online. This is part of his research. The shopping cart helps him select the items. The checkout experience is streamlined towards purchase. The tracking of the item improves the receiving experience. There is feedback loop and maintenance considerations with use of item. The dialog with the customer is the focus of the assistance efforts. All these customer experiences translate to sofware interactions in two or three major categories on the enterprise side. For example, the Market/Offer will require services separate from sell/fulfill. The sell/fulfill will required data servces different from insight. Note that the services are easily enumerable within each category to be different from each other while they share the same information management backend which is facilitated with MongoDB which provides a one-stop shop for information management specifically in the areas of merchandising, inventory, customer and insight The merchandising fulfils market/offer services such as guide, offer, semantic search, recommend, rule-based decisions, pricing and coupons. The Sell/fulfill provides ways for order, payments, fraud detection, fulfilment and business rules. The Insight provides ways for session capture and activity monitoring.

#codingexercise continued

Yesterday we described finding the largest submatrix in a binary matrix.

We said the conventional way was to find connected components. Today we explain this:

int maxsize = 0
for (int i = 0; i < ROW; i++)
for (int j = 0; j < COL; j++)
if ( A[i,j] == 1 && !visited[i, j])
{
int size = 0;
count++;
DFS(A, ROW, COL, i, j, ref visited, ref size);
if (size > maxSize)
maxSize = size;

}
This can be modified to record the cells of the submatrix contributing to the maximum size.
Given a cell in a connected component, we can find the top-left corner of the bounding submatrix with the minimum values for the x and y co-ordinates.

The maximum values of the x and y co-ordinates leads us to the bottom-right corner of the bounding submatrix

Monday, October 16, 2017

Maximize the count of 1 in a binary matrix by flipping any submatrix exactly once.

The usual way of answering it is to find the most connected component of zero by adjacency breadth-first-traversal and then find the top-left and bottom-right inclusive boundaries based on the min and max of co-ordinates.

But here is a twist courtesy an online solution:

Step 1. Calculate a similar sized matrix as the input but having values as counts of 1 for a sub matrix ending with that cell as bottom-right and starting from 0,0. Let us call this matrix 'ones' matrix.

Step 2. Exhaustively iterate all submatrices with boundary (i,j) top left to (i+k-1, j+k-1) as bottom right and determine the number of ones after a flip of this submatrix. Let us call this as 'cal' submatrix. We are not flipping square submatrices as i and j can each range from 1 to i+k-1 as long as they are within the input matrix. Also, we count the ones of the whole input matrix. For now, let us follow the steps

Step 3. Within each iteration, the resulting value of the whole input matrix after the flip comes out to be:

ones[R,C] - ones in the chosen submatrix + zeroes in the chosen sub matrix

= ones[R,C] - cal(i, j, i+k-1, j+k-1) + (k x k -cal(i, j, i+k-1, j+k-1))

= ones[R, C] + k x k + 2 x cal(i, j, i+k-1, j+k-1)

Step 4. From the value computed in step 3 for each iteration, we determine the max found so far and return it as the final result.