Thursday, November 2, 2017

We were discussing Master Data Management. Some of the top players in this space include companies such as Informatica, IBM Infosphere, Microsoft, SAP Master and Riversand. Informatica offers an end to end MDM solution with an ecosystem of applications.  It does not require the catalog to be in a single domain. Infosphere has been a long player and its product is considered mature with more power for collaborative and operational capabilities. It plays well with other IBM solutions and their ecosystem. SAP consolidates governance over the master data with emphasis on data quality and consistency. It supports workflows that are collaborative and is noted for supplier side features such as supplier onboarding. Microsoft Data services that includes the SQL Server makes it easy to create master lists of data with the benefit that the data is made reliable and centralized so that it can participate in intelligent analysis. Most products require changes to existing workflows to some degree to enable customer to make the transition.
The trouble with  MDM users is that they still prefer to use MS Excel. This introduces ETL based workflows and silo-ed views of data. Materialized views don't help because they are not updated in time. Also, any separation of stages to data manipulation introduces human errors and inconsistencies in addition to delay to reach the data.  The logic in the ETL also becomes more idempotent as it is needlessly exercised even if there are a few rows only to be inserted. Moreover the operation on each row now has to be made robust by making sure the corresponding row does not already exist. For example to move a record from source to destination, there must be a check to see if its exists in the destination already and to insert it and delete from the source. The delete cannot happen unless the record has already been inserted. and each of these operations has to be done for each row. Error checking for the workflow now includes checks against duplicate entries, syntactic or semantically equivalent entries and the progression of state for an entry to be forward only. These kind of checks could all be avoided if it were left to a service rather than an ETL workflow but access to the service is not always preferred to be programmatic so we have ADO.NET clients or cursor like tools that translate to LINQ queries.
#codingexercise
Two of the nodes of a BST are swapped. Correct the BST: 

// performed during Inorder traversal
void CorrectBSTHelper(Node root, ref Node prev,  ref Node first, ref Node middle ref Node second)
{
if (root == null) return;
//Inorder
CorrectBSTHelper(root.left , ref prev,  ref first, ref middle, ref second);

if (prev && root.data < prev.data)
{
 if (first == null)
 {
     first = prev;
     middle = root;
 }
 else
 {
     second = root;
 }
// we can only check prev and  not next but the incorrect node may be the first, or in the middle or in the last of a sequence.
// when next becomes root, prev is already not null so we can change the above to checking
// if prev == null
// if prev != null && root < prev
// if prev != null && root > prev
// and return first and second only as found
// This avoids having to keep track of middle
}
prev = root;
CorrectBSTHelper(root.right, ref prev, ref current, ref first, ref middle, ref second);
}

void CorrectBST(Node root)
{
Node first = null;
Node middle = null;
Node second = null;
Node prev = null;ull;
CorrectBSTHelper(root, ref prev, ref first, ref middle, ref second);
if (first && second)
   Swap(first, second);
if (first && middle)
   Swap(first, middle);
return;
}

      4
  2      6
1  5 3    7
1254367

No comments:

Post a Comment