Cluster computing

Friday, March 26, 2021

MDM as a service

Due to the lagging embrace of cloud-based master data management technologies ( with the possible exception of Snowflake due to its architecture ), Riversand continues to enjoy widespread popularity in an on-premise solution.

Riversand offers data modeling, data synchronization, data standardization, and flexible workflows within the tool. It offers scalability, performance, and availability based on its comprehensive Product Information Management (PIM) web services. Functionality is provided in layers of information management starting with print/translation workflows at the bottom layer, followed by workflow or security for access to the assets, their editing, insertions, and bulk insertions, followed by integrations or portals followed by flexible integration capabilities, full/data exports, multiple exports, followed by integration portals for integration with imports/exports, data pools and platforms, followed by digital asset management layer for asset onboarding and delivery to channels, and lastly, data management for searches, saved searches, channel-based content, or localized content and the ability to author variants, categories, attributes and relationships to stored assets. The top players in MDM other than Riversand, include companies such as Informatica, IBM Infosphere, Microsoft, and SAP Master. Informatica offers an end-to-end MDM solution with an ecosystem of applications. It does not require the catalog to be in a single domain. Infosphere has been a long-player and its product is considered mature with more power for collaborative and operational capabilities. It plays well with other IBM solutions and their ecosystem. SAP consolidates governance over the master data with emphasis on data quality and consistency. It supports workflows that are collaborative and is noted for supplier side features such as supplier onboarding. Microsoft Data services that include the SQL Server make it easy to create master lists of data with the benefit that the data is made reliable and centralized so that it can participate in the intelligent analysis. Most products require changes to existing workflows to some degree to enable customers to make the transition.

MDM in the form of software-as-a-service in a multi-tenant private cloud requires large investments in datacenters and a significant number of customers. Instead, the option to create elastic virtual data warehouses in the cloud with some home-grown web services seems far cheaper for those companies. If the costs could be articulated and comparable, then businesses will not hesitate to make a decision. The on-premise solution has a significant TCO to contend with.

Thursday, March 25, 2021

Comparision of MDM with MongoDB...

Comparison with MongoDB:

MongoDB provides all aspects of data storage for online retail stores. It can categorize data based on merchandising, content, inventory, customer, channel, sales and fulfillment, insight, and social. Out of these merchandising, inventory, customer, and insight are the most heavily used for peak holiday sales season. In addition, supply chain management systems and data warehouses can also integrate well with this database.
MongoDB is not a traditional Master data management product but it addresses most of the requirements. A traditional Master Data Management has a well-known organization of catalog with support for hierarchy and dimensions. MongoDB organizes the catalog in the form of Items, Pricing, Promotions, Variants, and Ratings, and Reviews. In JSON this appears as nested fields and are pre-joined into objects. The objects live in the cache as well. A search engine provides a search over the catalog. Functional data access is provided by the Product API. The API and the engine separately cover all operations on the catalog. The API can then be used by downstream units such as Online Store, Marketing, Inventory, SCMS, and other APIs. The Search engine is built on Lucene/Solr Architecture A Lucene index keeps track of terms and their occurrence locations, but the index needs to be rebuilt each time the catalog changes. The Product API can retrieve results directly from the catalog or via the search engine. In both cases, the customer only issues a single query.

MongoDB offers the catalog as a one-stop-shop from its store. There are no sub-catalogs or fragmentation or ETL or message bus. It is highly available to Application servers, API data, services, and web-servers. It is also available behind the store for supply chain management and data warehouse analytics which typically have their own analysis stacks. The catalog is available for browsing as well as searching via the Lucene search index. Queries can be written with keywords to narrow down the results from the catalog. MongoDB allows geo-sharding with persisted shard ids or more granular store ids for improving high availability and horizontal scalability. It provides local real-time writes and tuned for the read-dominated workload. It performs bulk writes for a refresh. The relational DB stores point in time loads while overnight they are pushed to catalog information management and made available for real-time views. The NoSQL powers insight and analytics based on aggregations. It provides a front-end data store for real-time queries and aggregations from applications. This comes with incredible monitoring and scaling

Both MDM and MongoDB Catalog store hierarchy and facet in addition to Item and SKUs as a way of organizing items.
MDM providers like Riversand, on the other hand, offer rebuildable catalog via change data capture, .Net powered comprehensive web services, data as a service model and reliance on traditional relational databases only

Wednesday, March 24, 2021

The design of a management suite over MDM   (continued ...)

Role of MDM:

MDM is an enhanced suite of Product Information Management (PIM) web services. Functionality is provided in layers of information management starting with print/translation workflows at the bottom layer, followed by workflow or security for access to the assets, their editing, insertions, and bulk insertions, followed by integrations or portals  followed by flexible integration capabilities, full/data exports, multiple exports, followed by integration portals for integration with imports/exports, data pools and platforms, followed by digital asset management layer for asset on-boarding and delivery to channels, and lastly, data management for searches, saved searches, channel-based content, or localized content and the ability to author variants, categories, attributes and relationships to stored assets.

Some MDMs do not require the catalog to be in a single domain. They have been long players and their products have matured with more power for collaborative and operational capabilities.  The trouble with MDM users is that they still prefer to use MS Excel. This introduces ETL-based workflows and silo-ed views of data. Materialized views don't help because they are not updated in time. Also, any separation of stages to data manipulation introduces human errors and inconsistencies, in addition, to delay to reach the data.

The unification: 

Together the Master Data Management and the storage engine provide an out of the box as well as a customizable solution that targets all the aspects of an online product catalog. The data is available for retrieval and iteration directly out of the store while hierarchy and dimensions may be maintained by a web service over the Object Storage.

The problem with catalog as with any master data management is that data becomes rigid. If the catalog is accessed more than 100 times per second, it causes an issue with performance. When other departments need to access it, ETL, message bus, and API services are put in place for this store. This results in fragmented data, redundant processes, heterogeneous data processing, and higher costs both in terms of time and money. Even users begin to see degradation in page load times. To solve the local performance, more caches, more message bus, and more ETL operations are added which only complicates it. A single view of the product and one central service becomes harder to achieve. The purpose of the catalog is to form a single view of the product with one central service, flexible schema, high read volume, write spike tolerant during catalog update, and to have advanced indexing and querying and geographical distribution for HA and low latency.

Tuesday, March 23, 2021

The design of a management suite over MDM  

Problem statement: 

Management of monitored agents and devices is a critical aspect of many software products. They are referred to as operations manager or system center and provide a single console to view and edit the inventory. Additionally, tasks can be performed on the managed objects in the local management group by specifying them on the console. Objects may be grouped, and actions can be taken collectively on this group so that they need not be repeated on one member after another.

This article describes some of the salient features and best practice for the implementation of a management suite. The simplest operations manager is a single management group for a set of managed objects whose details and state are maintained in a database and managed by a single server or microservice.

Architecture:

The core components of a management server usually comprise of a scan engine, a collecting server, and a user interface. The server imports collected scan data from all the managed objects into a database. The user interface allows that data to be studied and the issues to be resolved by performing tasks on the managed objects. The argument we make here is that this datastore is best replaced by an MDM. Furthermore, web applications and services that do not have transactional semantics access storage that is conducive to an object store. A management suite has specific separation of concerns between the read-writes performed for state management versus the read-only reporting stack. An MDM with its family of RESTful API services, is well suited for utilization as management offering over the inventory of managed objects and devices. MDM has traditionally proven effective in organization of inventory. For example, a product catalog stores details on items sold by a business such as Name, Code, Category, Sub-Category, Standard Cost and ListPrice. Since large businesses have a lot of merchandise differing in categories, styles, color, grade, model, make, serial code, the catalog needs a lot of organization to properly access an item. Such catalogs often require storage upwards of terabytes of data and were previously stored in relational databases.

Yet the nature of access on this data has traditionally been read-only. Moreover, since the catalog is large, their browsing is more filtered than a retrieval of all items. The catalog operations involve searching, grouping, and sorting based on criteria that look remarkably like SQL queries. These queries, the massive catalog of products and the organizational requirements are challenges that do not necessarily need the encumbrance of a relational database management system and can be exclusively handled and better served with an object store.

Object Storage is a limitless storage. The storage is elastic, and the addition of capacity is as trivial as adding nodes to a cluster. The object storage does not differentiate between the nodes because the storage pool virtualizes all the storage. Therefore, a product catalog can grow to any size while the physical storage array lends itself for easy maintenance.

An Object Storage offers better features and cost management than most competitors such as relational databases, document datastores and private web services. The existence of catalog in the market on a variety of stacks and not just the Master Data Management software providers indicate that this is a lucrative and a commercially viable offering. From database servers to an appliance, the catalog is viewed as a web accessible storage, but it provided with form of constraints to its access. Object storage not only facilitates its access but also provides no restraints on its access. It does not treat the catalog as a mere content and provides redundancy as a content distribution network.

Monday, March 22, 2021

Components of a management suite:

Problem statement: This article describes some of the salient features and best practices for the implementation of a management suite whose design was referenced here. The simplest operations manager is a single management group for a set of managed objects whose details and state are maintained in a database and managed by a single server or microservice.

Architecture: The core components of a management server usually comprise a scan engine, a collecting server, and a user interface. The scan engine evaluates based on policies that can be specified as conditions. Conditions are built on predicates and can be independent of one another. These predicates are simple boolean operands and can be stored in a database as a regular text column that is parsed before applying or executing the queries. The order of execution of the logic follows the listing order of the rules. Old rules can be deleted, and new rules can be added. The output of the rules evaluation determines the result of the scan. The scan engine design model can optionally provide the ability to run one rule or all rules in the listing. It can even organize the rules in suites. These suites can be created thematically depending on the type of scan to be performed.

The user interface exposes the types of rule categories on a dashboard so that the user may select one of them to execute. The invocation of a category displays a progress bar for the scan. The evaluation of the scan results in a set of findings and results. These findings are listed based on the source data that failed the conditions associated with a check. There is a correlation between the check and the finding so it is easy to navigate between the two. Similarly, there is a correlation between the finding and the result, and it is easy to interpret the result with transitive correlation to the check which may also describe a remedial action to be performed. All the checks, findings, and result entries are stored in tables in a database, so they are flexible, dynamically updated, and specified in large numbers. The user interface can enable the view of the grouped objects.

The remedial actions on the managed objects might be taken independent of the management suite, so it is helpful to run these scans multiple times to see improvements in their state. The repeated runs of the management suite can be seen from the new entries in the database tables.

The operational aspect of the management suites may be exposed via a dashboard that lists the number of scans, their breakdown by successes and failures and if necessary, the execution time. These statistics can also be aggregated based on the last week or last month so that the runs and their corresponding numbers can be viewed over a period.

Tasks and the corresponding invocations on the managed objects are left out of this discussion but they are briefly described here as asynchronous tasks that may need to be repeated over the members of a group of managed objects. A message broker helps tremendously in the tracking of asynchronous tasks, especially in a fan-out model. This message broker may also have an associated set of processors that update state in a transactional database. The state of the managed objects can then become the source of truth for the management suited.

Conclusion: Some of the components of a management suite are described here. The management suite can be expanded with the addition of serverless computing or microservices for added functionality.

Sunday, March 21, 2021

Sample program to count the number of different triplets (a, b, c) in which a occurs before b and b occurs before c from a given array.

Solution: Generate all combinations in positional lexicographical order for given array using getCombinations method described above. Select those with size 3. When selecting the elements, save only their indexes, so that we can determine they are progressive.

class solution {

public static void getCombinations(List<Integer> elements, int N, List<List<Integer>> combinations) {

for (int i = 0; i < (1<<N); i++) {

List<Integer> combination = new ArrayList<>();

for (int j = 0; j < elements.size(); j++) {

if ((i & (1 << j)) > 0) {

combination.add(j);

}

List<Integer> copy = new ArrayList<Integer>(combination);

combinations.add(copy);

}

public static void main (String[] args) {

List<Integer> elements = Arrays.asList(1,2,3,4);

List<List<Integer>> indices = new ArrayList<Integer>();

getCombinations(elements, elements.size(), indices);

indices.stream().filter(x -> x.size() == 3)

.filter(x -> x.get(0) < x.get(1) && x.get(1) < x.get(2))

.forEach(x -> printList(elements, x));

}

public static void printList(List<Integer> elements, List<Integer> indices) {

StringBuilder sb = new StringBuilder();

for (int i = 0; i < indices.size(); i++) {

sb.append(elements.get(indices.get(i)) + " ");

}

System.out.println(sb.toString());

}

/* sample output:

1 2 3

1 2 4

1 3 4

2 3 4

Saturday, March 20, 2021

Building a chat application on Android development platform: (continued...)

Addendum:

Firebase Cloud messaging Android extension provides message broker abilities for a client application and it works well on Android. Client applications use the FirebaseMessaging API and must have Android Studio 1.4 or higher. Firebase can be added to the project if the device or emulator has Google Play services installed.

The Android app can be connected to the Firebase using Option 1. Firebase console or Option 2. Android Studio Firebase assistant. The Option 1 requires the Firebase console to download the Firebase configuration files and then moved into the Android project. The Firebase configuration file can be used when the Google services plugin 'com.google.gms:google-services:4.3.5' can be added to the Gradle file. The Firebase SDK can be added to the application with the ‘implementation platform('com.google.firebase:firebase-bom:26.7.0')’ The Option #2 uses the Firebase Assistant which has preconfigured workflows to add a Firebase product to the application.

If Kotlin is used in the Android application, the Kotlin extension (KTX) libraries need to be added.

Any application written using Firebase needs to be registered. Once the configuration file is added, the application manifest also needs to be edited. A service that extends the FirebaseMessaging Service needs to be added. This Service extends the base class to add functionality for receiving notifications in the foreground, receiving data payload, and sending upstream messages. Metadata elements can be optionally added to the manifest to set icon and color. Notification channels can be added on a per queue basis.

When the application is started, during initial time, the SDK generates a token for the client application. This new token is used with the onNewToken overridden method. Tokens can be rotated so the application must always retrieve the latest updated registration token. The current token can be retrieved using the FirebaseMessaging.getInstance().getToken() Token autogeneration can be prevented by disabling the Analytics collection and Firebase Cloud Messaging auto initialization.

The storage can also be provisioned via Firebase. Cloud Firestore is just right for this case. It is a flexible scalable database for mobile, web and server development from Firebase and Google cloud. One of the primary advantages of this database is that it caches that data that the application is using which makes it easy to access the data even when the device is offline.

It is a cloud hosted NoSQL database, and it is accessed directly via native SDKs as well as REST APIs. Data is stored in documents; documents are kept in collections and the data primitives support complex objects. Realtime listeners can be registered to retrieve just the updates instead of the entire database.