Saturday, June 30, 2018

#codingexercise
Problem: Given two container measurements of a and b used to distribute a commodity with a total quantity of d  to n customers , determine the maximum number of customers who can be satisfied with their individual requirements of a and b 
Solution: use a greedy strategy to take the increasing order aggregated demands of a and b together for each person which helps us maximize the customer count

List<int> GetSatisfiedCustomers(List<Tuple<int,int>> demands, int d)
{
  assert(demands != null);
  assert(demands.Count() > 0);
  assert (d > 0);
  Converter<Tuple<int,int>, Tuple<int,int>> converter =
            (item,index) => { 
                                          return new Tuple<int, int> ( item.first*a+item.second*b, index );
                                      };
  var aggregated = demands.ConvertAll<Tuple<int,int>>(converter).ToList();
  aggregated.sort(); // by aggregated demand
  var customers = new List<int>();
  aggregated.ForEach( demandByPerson => { 
                                                 if (demandByPerson.First > d) { 
                                                                      d -= demandByPerson.First;
                                                                      customers.Add(demandByPerson.Second);
                                                                         }
                                              });
  return customers;
}
The above method merely tries to represent the individual demand of the users in unit measurements and then sort them in the increasing order. With the help of the increasing numbers for the demands, we are now able to include as many customers as we can.

Friday, June 29, 2018

We resume our discussion on Storj network and it's messaging
The messaging system implemented by Storj network is Quasar. The publisher-subscriber model is topic based and implemented utilizing Bloom Filters. The topics are predetermined and made available at the protocol level. This helps with implementing Storj over Kademlia.  Three new message types are added that correspond to subscribe, update and publish methods. The topics can therefore be extended and include such things as contract parameters and bandwidth reservation. The Kademlia messages facilitate the creation and propagation of filters. Each node maintains information about topics to which it subscribes as well as those to which its neighbors subscribe.  Three neighbors are chosen to form the initial subscribe list.  The response to the subscribe includes this current filter list.  The requesting nodes then merges these lists.  This is a pull operation. The update message does the reverse. It pushes the filter changes to the three nearest neighbors. It usually follows the subscribe message so that the loop helps all the nodes learn what is relevant to their neighbors and to those reachable from them. Publish messages broadcast the message to the network.  These messages are sent to a node with three nearest neighbors and includes a topic parameter. If the topic is found, the node processes the message. If it is found in the other filters in the filter list, it is forwarded to the three neighbors. If nothing matches, it is forwarded to a randomly selected peer.  To prevent the message from coming back, the nodes add their id to the message. Publish messages also include time to live to prevent spam attacks.
We were comparing Kademlia with other P2P networks. In general they provide a good base for large scale data sharing and application level multicasting.  Some of the desirable features of P2P networks include selection of peers, redundant storage, efficient location, hierarchical namespaces, authentication as well as anonymity of users.  In terms of performance, the P2P has desirable properties such as efficient routing, self-organizing, massively scalable and robust in deployments, fault tolerance, load balancing and explicit notions of locality.  Perhaps the biggest takeaway is that ht P2P is an overlay network with no restriction on size and there are two classes structured and unstructured. Structured P2P means that the network topology is tightly controlled and the content are placed at random peers  and at specified location which will make subsequent queries more efficient. DHTs fall in this category where the location of the data objects is deterministic and the keys are unique. In distributed computing we saw the benefit of arranging the nodes in a ring with hashes at different nodes. Napster was probably the first example to realized the distributed file sharing benefit with the assertion that requests for popular content does not need to be sent to a central server. P2P file sharing systems are self-scaling.
#codingexercise
Given a paper size of A x B, cut the papers into squares of any size and determine the minimum number of squares.
uint GetMinCount(uint a, uint b)
{
uint count= 0;
uint remainder = 0;
uint min = GetMin(a,b);
uint max= GetMax(a,b);
while (min > 0)
{
   count += max/min;
   remainder = max%min;
   max = min;
   min = remainder;
}
return count;
}

Thursday, June 28, 2018

We resume our discussion on Storj network.
We were looking  at audits. The data owner must send a message to have the farmer issue the audit. This was done by extending the Kademlia message set with a new one.  The message contains a hash of the data and a challenge.  The farmer responds with a Merkle proof and the data owner issues payment to the farmer.  The contract helps determine the exchange. Since it is between the data owner and the farmer, it includes all the elements for the node to form a relationship, transfer the data, respond to audits and settle on payments. Parties interested in forming contract may use the publisher-subscriber model that is common in most messaging systems. Contracts are signed and stored by both parties.
The messaging system implemented by Storj network is Quasar. The publisher-subscriber model is topic based and implemented utilizing Bloom Filters. The topics are predetermined and made available at the protocol level. This helps with implementing Storj over Kademlia.  Three new message types are added that correspond to subscribe, update and publish methods. The topics can therefore be extended and include such things as contract parameters and bandwidth reservation. The Kademlia messages facilitate the creation and propagation of filters. Each node maintains information about topics to which it subscribes as well as those to which its neighbors subscribe.  Three neighbors are chosen to form the initial subscribe list.  The response to the subscribe includes this current filter list.  The requesting nodes then merges these lists.  This is a pull operation. The update message does the reverse. It pushes the filter changes to the three nearest neighbors. It usually follows the subscribe message so that the loop helps all the nodes learn what is relevant to their neighbors and to those reachable from them. Publish messages broadcast the message to the network.  These messages are sent to a node with three nearest neighbors and includes a topic parameter. If the topic is found, the node processes the message. If it is found in the other filters in the filter list, it is forwarded to the three neighbors. If nothing matches, it is forwarded to a randomly selected peer.  To prevent the message from coming back, the nodes add their id to the message. Publish messages also include time to live to prevent spam attacks.
One of the things we quickly see as a comparision with messaging systems such as RabbitMQ/ZeroMQ is that the distributed computing notions of a virtual time, propagation is universal and required in these cases. Some of the fallacies we have to watch out for include the notions that the network is reliable, there is no latency, the bandwidth is infinite, the network is secure, the topology does not change, there is only one administrator, transport cost is zero, the network is homogeneous. 

Wednesday, June 27, 2018

We resume our discussion on Storj network.
We were looking  at contracts and negotiations. Contracts is between the data owner and the farmer. The latter is a term used to describe the one who houses the data.  The term farmer or banker is interchangeable because he provides a storage for one of the distributed ledgers and hopes to make money from mining it.  The banker or farmer does not need to know the contents of the data.  The data owner retains complete control over the encryption key which determines access to the data.  The data owner keeps a set of challenges, selects one of them and sends it to the farmer.  The farmer uses the challenge and the data to generate the Merkle pre-leaf and uses it with the other leaves to generate a Merkle proof. This proof is sent back to the data owner. The data owner uses the root and tree depth  to verify the proof.
The proofs may be compact but the tree may not be. Moreover the shard may be hashed many times to generate pre-leaves. Therefore an optimization is used to generate partial audits using subsets of data which reduces overhead in compute and storage. Unlike the full audits, the partial audits give only a probabilistic assurance that the farmer retains the entire file. This means there could be false positives since the probability is known, it gives a confidence level
The data owner must send a message to have the farmer issue the audit. This was done by extending the Kademlia message set with a new one.  The message contains a hash of the data and a challenge.  The farmer responds with a Merkle proof and the data owner issues payment to the farmer.  The contract helps determine the exchange. Since it is between the data owner and the farmer, it includes all the elements for the node to form a relationship, transfer the data, respond to audits and settle on payments. Parties interested in forming contract may use the publisher-subscriber model that is common in most messaging systems. Contracts are signed and stored by both parties.
#codingexercise
Connect  N ropes with minimum cost
          Make a heap of n ropes
          While the size of the heap is > 1
                       Extract minimum cost rope
                       Extract minimum cost rope
                       Insert the rope with combined cost
          The last rope is the union forming minimum cost
Find the minimum number of coins from an infinite supply of varying denominations to form a change V
          sort the coins from big to small denominations
          while V > smallest denomination :
                pick a coin from the largest denomination that is smaller than V and add it to the collection to be returned
                reduce V by the value of the coin and repeat
          return the collection

Tuesday, June 26, 2018

We were discussing two new operators:

one for the use of additional columns to generate groups that facilitate hierarchical view of entities and another for additional rows to describe new pseudo-entities transformed from exiting entities that is helpful to view entities in the form of layers.
The ability to view the entities as a hierarchical organization let us expand our logic to perform tree-based searches and to assign different roles to entities so that the data may have augmented information to improve the queries. Let us take an example. If we had an employee table that listed all the employees of an organization then additional columns such as manager_level_001_id and manager_level_002_id and so on could essentially provide information that does not repeatedly need to be calculated in subsequent queries. More importantly the addition of columns for groups may be a temporary table so that we don't necessarily write on the original data. Besides if storage is unlimited, it helps to memoize results of processing that is going to be repeated. This now favors a non-recursive tree algorithm to list the managers all the way to the root for any employee as well as the reuse of the logic to determine the next level manager as opposed to recursive queries across levels. This form of data extract, transform and load is popular in data reporting stacks that are used to publish charts and graphs from data.
The ability to view the entities as not just those existing but composites that behave the same as existing entities but also in simpler abstraction of entities purposes different from those involving original entities. We could take another example here. Let us say there was a table of email addresses. We could list different email addresses for the same person and also list different distribution lists that are made up of one or more email addresses. We could also dynamically combine individual and distribution lists into new temporary entries. Combinations may be multi-leveled. Since there is a separation of concern involving the original and the composites from the query, this is considered layering. Since the selection of original entities determine the composites for use in the higher layer, this comes as a different technique from grouping and hierarchies.

Monday, June 25, 2018

Yesterday we were discussing storj  as a  peer to peer cloud storage network. Regardless of whether we store data via a distributed hash table, in a streaming storage, object storage, or some cloud database, the enablement of data usages matter most in the choice of storage. 
Let us take a look at one of the most common form of data usage which is querying. We had brought up earlier that querying language is usually independent of the data source and even the architecture. The language is mostly to enable the data usage and is a form of expression of logic.
We saw two different forms of expression of this language 
First - we saw the expression in the form of standard query operators which became popular across languages in the form of expressions such as .Where() and .Sum() as in LINQ. This was convenient primitives taking inspiration from the tried, tested and well established SQL Query language. The language is very inspiring to express queries in succinct manner often enabling chaining to refine and improve resultsets.
The second form of expression was with the search query language which has had a rich history in shell scripting and log analysis where the results of one command are piped into another command for transformation and analysis. Although similar in nature to chaining operators the expressions of this form of query involved more search like capabilities across heterogenous data such as with the use of regular expressions for detecting patterns.  This form of expression not only involved powerful query operators but facilitated data extract, transform and load as it made its way through the expressions. 
Operators used in both these forms of query expressions expanded to include more involved from of computations such as grouping, ranking, sorting and such analysis. 
In particular there are two new operators possible - one for the use of additional columns to generate groups that facilitate hierarchical view of entities and another for additional rows to describe new pseudo-entities transformed from exiting entities that is helpful to view entities in the form of layers.
The ability to view the entities as a hierarchical organization let us expand our logic to perform tree-based searches and to assign different roles to entities so that the data may have augmented information to improve the queries. More importantly the addition of columns for groups may be a temporary table so that we don't necessarily write on the original data. This form of data extract, transform and load is popular in data reporting stacks that are used to publish charts and graphs from data.
The ability to view the entities as not just those existing but composites that behave the same as existing entities but also in simpler abstraction of entities purposes different from those involving original entities. Since there is a separation of concern involving the original and the composites, this is considered layering. Since the selection of original entities determine the composites for use in the higher layer, this comes as a different technique from grouping and hierarchies. 
#codingexercise

#codingexercise
We were discussing the count of the number of decreasing paths in a matrix if the traversal is permitted only on adjacent horizontal and vertical cells The same works for increasing paths 
  • Solution: we can sum the number of paths with each cell as the starting location if we know how to do it for  a given starting position. 

initialize the sum to one for the number of paths at the center where the center forms a standalone single element  path
foreach of the four adjacent positions around a center:
          if the value is greater than the center:
                recurse to find the count at the adjacent cell of its not available in the dp matrix or insert the value there after finding.
                add the count through that adjacent cell to the sum at the center.


return the sum of paths for this position

Sunday, June 24, 2018

We were reviewing Storj network.
We were looking  at contracts and negotiations. Contracts is between the data owner and the farmer. The latter is a term used to describe the one who houses the data.  The term farmer or banker is interchangeable because he provides a storage for one of the distributed ledgers and hopes to make money from mining it.  The banker or farmer does not need to know the contents of the data.  The data owner retains complete control over the encryption key which determines access to the data.  The data owner keeps a set of challenges, selects one of them and sends it to the farmer.  The farmer uses the challenge and the data to generate the Merkle pre-leaf and uses it with the other leaves to generate a Merkle proof. This proof is sent back to the data owner. The data owner uses the root and tree depth  to verify the proof.
The proofs may be compact but the tree may not be. Moreover the shard may be hashed many times to generate pre-leaves. Therefore an optimization is used to generate partial audits using subsets of data which reduces overhead in compute and storage. Unlike the full audits, the partial audits give only a probabilistic assurance that the farmer retains the entire file. This means there could be false positives since the probability is known, it gives a confidence level.

#codingexercise
We were discussing the count of the number of decreasing paths in a matrix if the traversal is permitted only on adjacent horizontal and vertical cells
Solution: we can sum the number of paths with each cell as the starting location if we know how to do it for  a given starting position. We saw the backtracking solution and we now see the dynamic programming one which utilizes the calculations of previously computed start locations.
initialize the sum to one for the number of paths at the center where the center forms a standalone single element  path
foreach of the four adjacent positions around a center:
          if the value is less than the center:
                recurse to find the count at the adjacent cell of its not available in the dp matrix or insert the value there after finding.
                add the count through that adjacent cell to the sum at the center.
return the sum of paths for this position

Saturday, June 23, 2018

We were reviewing Storj network. This was initiated to address scalability and increase decentralization.  It is a distributed cloud storage network and removes the notion of a centralized third party storage provider.  It provides client side encryption which improves data security. It maintains data integrity with a proof of retrievability. It introduces a network first design where peers are autonomous agents and there is a protocol to enable them to negotiate contracts, transfer data, verify the integrity and availability of remote data and to reward with payments. It provides tools to enable all these interactions.
It distributes the storage of a file as shards on this network and these shards are stored using a distributed hash table. The shards are themselves not stored in this hash table, rather a distributed network and messaging facilitates it with location information. Storj is built on Kademlia which provides the distributed hash table. Kademlia routes messages  through low latency paths and uses parallel asynchronous queries to avoid timeouts. It also reduces the number of configuration messages peers need to learn. The information spreads automatically  so the number of nodes can scale. Many of the important properties of Kademlia can be formally proven. Keys are opaque hash of some larger data and peers have node IDs. The key-value pairs are stored on the nodes with the ID close to the key where closeness is some notion. With the use of Kademlia, Storj focuses on the data instead. Whether the file is in tact is informed via challenge response interaction which is dubbed an audit. Merkle trees and proofs are used to verify integrity. This scheme may introduce some overhead so an extension is used that utilizes subsets of the data.
Today we will look at contracts and negotiations. Contracts is between the data owner and the farmer. The latter is a term used to describe the one who houses the data.  The term farmer or banker is interchangeable because he provides a storage for one of the distributed ledgers and hopes to make money from mining it.  The banker or farmer does not need to know the contents of the data.  The data owner retains complete control over the encryption key which determines access to the data.  The data owner keeps a set of challenges, selects one of them and sends it to the farmer.  The farmer uses the challenge and the data to generate the Merkle pre-leaf and uses it with the other leaves to generate a Merkle proof. This proof is sent back to the data owner. The data owner uses the root and tree depth  to verify the proof.
#codingexercise
Count the number of decreasing paths in a matrix if the traversal is permitted only on adjacent horizontal and vertical cells
Solution: we can sum the number of paths with each cell as the starting location if we know how to do it for  a given starting position. We can do this backtracking as well as dynamic programming.  Backtracking does not allow us to make a wrong choice but we repeat the same calculations over and over again. Dynamic programming utilizes the calculations of previously computed start locations. Ideally a good starting location will be the center of the matrix.
initialize the sum to one for the number of paths at the center where the center forms a standalone single element  path
foreach of the four adjacent positions around a center:
          if the value is less than the center:
                recurse to find the count at the adjacent cell
                add the count through that adjacent cell to the sum at the center.
return the sum of paths for this position

Friday, June 22, 2018

We were reviewing Storj network   This was initiated to address scalability and increase decentralization.  It is a distributed cloud storage network and removes the notion of a centralized third party storage provider.  It provides client side encryption which improves data security and data integrity is maintained with a proof of retrievability. It introduces a network first design where peers are autonomous agents and there is a protocol to enable them to negotiate contracts, transfer data, verify the integrity and availability of remote data and to reward with payments. It provides tools to enable all these interactions. It distributes the storage of a file as shards on this network and these shards are stored using a distributed hash table. The shards are themselves not stored in this hash table, rather a distributed network and messaging facilitates it with location information. Storj is built on Kademlia which provides the distributed hash table. Kademlia routes messages  through low latency paths and uses parallel asynchronous queries to avoid timeouts. It also reduces the number of configuration messages peers need to learn. The information spreads automatically  so the number of nodes can scale. Many of the important properties of Kademlia can be formally proven. Keys are opaque hash of some larger data and peers have node IDs. The key-value pairs are stored on the nodes with the ID close to the key where closeness is some notion. With the use of Kademlia, Storj focuses on the data instead. Whether the file is in tact is informed via challenge response interaction which is dubbed an audit. Merkle trees and proofs are used to verify integrity. This scheme may introduce some overhead so an extension is used that utilizes subsets of the data.

Thursday, June 21, 2018

Today we will start reviewing Storj network.  This was initiated to address scalability and increase decentralization.  It is a distributed cloud storage network and removes the notion of a centralized third party storage provider.  The decentralization not only helps mitigate traditional data failures and outages but also supports new workloads such as from blockchain. Blockchain is a distributed ledger. There is a high degree of privacy for the individual whose transactions are maintained in this ledger. It does not divulge any personally identifiable information and can still prove ownership of entries. The ledger itself is maintained by a community where no one actor can gain enough influence to submit  a fraudulent transaction or alter recorded data. Therefore Blockchain opens up new possibilities in many ecosystems and Storj network facilitates its security, privacy and data control model. In production storage, peer to peer networks were not popular as data because data accrues based on popularity not on utility.  Storj network introduces a challenge response verification system combined with direct payments.  In addition there is a set of federated nodes that alleviate access and performance concerns. Storj network also brings client side encryption.
Cloud storage is used heavily by large storage providers who act as trusted third parties to transfer and store data. Client side encryption improves data security while data integrity will be maintained with a proof of retrievability. Storj network introduces a network first design where peers are autonomous agents and there is a protocol to enable them to negotiate contracts, transfer data, verify the integrity and availability of remote data and to reward with payments. It provides tools to enable all these interactions. Moreover, it distributes the storage of a file as shards on this network and these shards are stored using a distributed hash table. The shards are themselves not stored in this hash table, rather a distributed network and messaging facilitates it with location information.

Wednesday, June 20, 2018

We were discussing the benefits of software defined stacks and we looked at examples including the one with an oncology focused software maker. The software was originally installed as a single instance multi-tenant application in the cloud. It was subsequently moved to PaaS.  The PaaS platform provided backend functions such as provisioning, deployment and security The separation of functionalities helped the oncology software maker to focus on application development and reduced schedule while the PaaS platform helped it grow.
This is true for organizations of any size. Even eBay and Paypal with its millions of users have found this strategy useful. As infrastructure and IT footprint grows, such automation improves agility.
Aside from automations, SDDC can also help with load balancing, object storage, database-as-a-service, configuration management, and application management. Together they bring improved agility and standardization.
#codingexercise
int GetMaxCountSquareSubMatrixSizekCountZerosOrOnes (int[,] A, int rows, int cols, int binary 
 
int max = INT_MIN; 
for int I = 0; I  < rows; i++) {  
    for int j = 0; j < cols; j++) {  
            // use this as the start of submatrix  
            int count = 0;  
            for int x = i; x < rows; x++)  
                for int y = j; y < cols; y++)  
                       If  ( A[x,y] == binary) 
                            count += 1;   
        if (count > maxmaxcount; 
      }  
 
return max 
 

Tuesday, June 19, 2018

We were discussing the benefits of software defined stacks. One of the advantages of software defined services is that it can be repeated over and over again in different underlying layers with no change or impact to existing workloads. Another benefit of software defined services is that reconfiguration is super easy By changing the settings we can use the same software stack to behave differently.  Server utilization and capacity is also improved. There energy footprint of the data center is also reduced.  The automations possible with the SDS not only reduces the time to deploy but also the effort involved such as approvals and handovers.
An example of the above was demonstrated by an oncology focused software maker. The software was originally installed as a single instance multi-tenant application in the cloud. It was subsequently moved to PaaS.  The PaaS platform provided backend functions such as provisioning, deployment and security The separation of functionalities helped the oncology software maker to focus on application development and reduced schedule while the PaaS platform helped it grow.
This is true for organizations of any size. Even eBay and Paypal with its millions of users have found this strategy useful. As infrastructure and IT footprint grows, such automation improves agility.
#codingexercise
int GetCountSquareSubMatrixSizekCountZerosOrOnes(int[,] A, int rows, int cols, int k, int binary)  
 
int total = 0;
for int I = 0; I  < rows; i++) {  
    for int j = 0; j < cols; j++) {  
            // use this as the start of submatrix  
            int count = 0;  
            for int x = i; x < k; x++)  
                for int y = j; y < k; y++)  
                       If  ( x < rows && y < cols && A[x,y] == binary) 
                            count += 1;   
             If (count == k * k) 
                  total += 1; 
      }  
 
return total;  

 

Monday, June 18, 2018

We were discussing virtualization and software defined stacks. Software defined technology stack aims to virtualize compute, network, storage and security aspects. One of the advantages of software defined services is that it can be repeated over and over again in different underlying layers with no change or impact to existing workloads. Another benefit of software defined services is that reconfiguration is super easy By changing the settings we can use the same software stack to behave differently.  Server utilization and capacity is also improved. There energy footprint of the data center is also reduced.  The automations possible with the SDS not only reduces the time to deploy but also the effort involved such as approvals and handovers. That said security and compliance needs to be studied with SDS deployments because they open up immense possibilities that go against hardening.
SDS can also move up the stack. This layer does not have to adhere to the hardware and can move up into applications and business operations. As an enabler for runtime it can host one or more workloads depending on what it is used for.


#codingexercise
int GetCountSquareSubMatrixSizekCountZeros (int[,] A, int rows, int cols, int k)  
 
int total = 0;
for int I = 0; I  < rows; i++) {  
    for int j = 0; j < cols; j++) {  
            // use this as the start of submatrix  
            int count = 0;  
            for int x = i; x < k; x++)  
                for int y = j; y < k; y++)  
                       If  ( x < rows && y < cols && A[x,y] == 0) 
                            count += 1;   
             If (count == k * k) 
                  total += 1; 
      }  
 
return total;