Cluster computing

Sunday, June 24, 2018

We were reviewing Storj network.
We were looking at contracts and negotiations. Contracts is between the data owner and the farmer. The latter is a term used to describe the one who houses the data. The term farmer or banker is interchangeable because he provides a storage for one of the distributed ledgers and hopes to make money from mining it. The banker or farmer does not need to know the contents of the data. The data owner retains complete control over the encryption key which determines access to the data. The data owner keeps a set of challenges, selects one of them and sends it to the farmer. The farmer uses the challenge and the data to generate the Merkle pre-leaf and uses it with the other leaves to generate a Merkle proof. This proof is sent back to the data owner. The data owner uses the root and tree depth to verify the proof.
The proofs may be compact but the tree may not be. Moreover the shard may be hashed many times to generate pre-leaves. Therefore an optimization is used to generate partial audits using subsets of data which reduces overhead in compute and storage. Unlike the full audits, the partial audits give only a probabilistic assurance that the farmer retains the entire file. This means there could be false positives since the probability is known, it gives a confidence level.

#codingexercise
We were discussing the count of the number of decreasing paths in a matrix if the traversal is permitted only on adjacent horizontal and vertical cells
Solution: we can sum the number of paths with each cell as the starting location if we know how to do it for a given starting position. We saw the backtracking solution and we now see the dynamic programming one which utilizes the calculations of previously computed start locations.
initialize the sum to one for the number of paths at the center where the center forms a standalone single element path
foreach of the four adjacent positions around a center:
if the value is less than the center:
recurse to find the count at the adjacent cell of its not available in the dp matrix or insert the value there after finding.
add the count through that adjacent cell to the sum at the center.
return the sum of paths for this position

Saturday, June 23, 2018

We were reviewing Storj network. This was initiated to address scalability and increase decentralization. It is a distributed cloud storage network and removes the notion of a centralized third party storage provider. It provides client side encryption which improves data security. It maintains data integrity with a proof of retrievability. It introduces a network first design where peers are autonomous agents and there is a protocol to enable them to negotiate contracts, transfer data, verify the integrity and availability of remote data and to reward with payments. It provides tools to enable all these interactions.
It distributes the storage of a file as shards on this network and these shards are stored using a distributed hash table. The shards are themselves not stored in this hash table, rather a distributed network and messaging facilitates it with location information. Storj is built on Kademlia which provides the distributed hash table. Kademlia routes messages through low latency paths and uses parallel asynchronous queries to avoid timeouts. It also reduces the number of configuration messages peers need to learn. The information spreads automatically so the number of nodes can scale. Many of the important properties of Kademlia can be formally proven. Keys are opaque hash of some larger data and peers have node IDs. The key-value pairs are stored on the nodes with the ID close to the key where closeness is some notion. With the use of Kademlia, Storj focuses on the data instead. Whether the file is in tact is informed via challenge response interaction which is dubbed an audit. Merkle trees and proofs are used to verify integrity. This scheme may introduce some overhead so an extension is used that utilizes subsets of the data.
Today we will look at contracts and negotiations. Contracts is between the data owner and the farmer. The latter is a term used to describe the one who houses the data. The term farmer or banker is interchangeable because he provides a storage for one of the distributed ledgers and hopes to make money from mining it. The banker or farmer does not need to know the contents of the data. The data owner retains complete control over the encryption key which determines access to the data. The data owner keeps a set of challenges, selects one of them and sends it to the farmer. The farmer uses the challenge and the data to generate the Merkle pre-leaf and uses it with the other leaves to generate a Merkle proof. This proof is sent back to the data owner. The data owner uses the root and tree depth to verify the proof.
#codingexercise
Count the number of decreasing paths in a matrix if the traversal is permitted only on adjacent horizontal and vertical cells
Solution: we can sum the number of paths with each cell as the starting location if we know how to do it for a given starting position. We can do this backtracking as well as dynamic programming. Backtracking does not allow us to make a wrong choice but we repeat the same calculations over and over again. Dynamic programming utilizes the calculations of previously computed start locations. Ideally a good starting location will be the center of the matrix.
initialize the sum to one for the number of paths at the center where the center forms a standalone single element path
foreach of the four adjacent positions around a center:
if the value is less than the center:
recurse to find the count at the adjacent cell
add the count through that adjacent cell to the sum at the center.
return the sum of paths for this position

Friday, June 22, 2018

We were reviewing Storj network This was initiated to address scalability and increase decentralization. It is a distributed cloud storage network and removes the notion of a centralized third party storage provider. It provides client side encryption which improves data security and data integrity is maintained with a proof of retrievability. It introduces a network first design where peers are autonomous agents and there is a protocol to enable them to negotiate contracts, transfer data, verify the integrity and availability of remote data and to reward with payments. It provides tools to enable all these interactions. It distributes the storage of a file as shards on this network and these shards are stored using a distributed hash table. The shards are themselves not stored in this hash table, rather a distributed network and messaging facilitates it with location information. Storj is built on Kademlia which provides the distributed hash table. Kademlia routes messages through low latency paths and uses parallel asynchronous queries to avoid timeouts. It also reduces the number of configuration messages peers need to learn. The information spreads automatically so the number of nodes can scale. Many of the important properties of Kademlia can be formally proven. Keys are opaque hash of some larger data and peers have node IDs. The key-value pairs are stored on the nodes with the ID close to the key where closeness is some notion. With the use of Kademlia, Storj focuses on the data instead. Whether the file is in tact is informed via challenge response interaction which is dubbed an audit. Merkle trees and proofs are used to verify integrity. This scheme may introduce some overhead so an extension is used that utilizes subsets of the data.

Thursday, June 21, 2018

Today we will start reviewing Storj network. This was initiated to address scalability and increase decentralization. It is a distributed cloud storage network and removes the notion of a centralized third party storage provider. The decentralization not only helps mitigate traditional data failures and outages but also supports new workloads such as from blockchain. Blockchain is a distributed ledger. There is a high degree of privacy for the individual whose transactions are maintained in this ledger. It does not divulge any personally identifiable information and can still prove ownership of entries. The ledger itself is maintained by a community where no one actor can gain enough influence to submit a fraudulent transaction or alter recorded data. Therefore Blockchain opens up new possibilities in many ecosystems and Storj network facilitates its security, privacy and data control model. In production storage, peer to peer networks were not popular as data because data accrues based on popularity not on utility. Storj network introduces a challenge response verification system combined with direct payments. In addition there is a set of federated nodes that alleviate access and performance concerns. Storj network also brings client side encryption.
Cloud storage is used heavily by large storage providers who act as trusted third parties to transfer and store data. Client side encryption improves data security while data integrity will be maintained with a proof of retrievability. Storj network introduces a network first design where peers are autonomous agents and there is a protocol to enable them to negotiate contracts, transfer data, verify the integrity and availability of remote data and to reward with payments. It provides tools to enable all these interactions. Moreover, it distributes the storage of a file as shards on this network and these shards are stored using a distributed hash table. The shards are themselves not stored in this hash table, rather a distributed network and messaging facilitates it with location information.

Wednesday, June 20, 2018

We were discussing the benefits of software defined stacks and we looked at examples including the one with an oncology focused software maker. The software was originally installed as a single instance multi-tenant application in the cloud. It was subsequently moved to PaaS. The PaaS platform provided backend functions such as provisioning, deployment and security The separation of functionalities helped the oncology software maker to focus on application development and reduced schedule while the PaaS platform helped it grow.
This is true for organizations of any size. Even eBay and Paypal with its millions of users have found this strategy useful. As infrastructure and IT footprint grows, such automation improves agility.
Aside from automations, SDDC can also help with load balancing, object storage, database-as-a-service, configuration management, and application management. Together they bring improved agility and standardization.
#codingexercise

int GetMaxCountSquareSubMatrixSizekCountZerosOrOnes (int[,] A, int rows, int cols, int binary)

{

int max = INT_MIN;

for ( int I = 0; I  < rows; i++) {

    for ( int j = 0; j < cols; j++) {

            // use this as the start of submatrix

            int count = 0;

            for ( int x = i; x < rows; x++)

                for ( int y = j; y < cols; y++)

                       If  ( A[x,y] == binary)

count += 1;

if (count > max) max= count;

}

return max;

}

Tuesday, June 19, 2018

We were discussing the benefits of software defined stacks. One of the advantages of software defined services is that it can be repeated over and over again in different underlying layers with no change or impact to existing workloads. Another benefit of software defined services is that reconfiguration is super easy By changing the settings we can use the same software stack to behave differently. Server utilization and capacity is also improved. There energy footprint of the data center is also reduced. The automations possible with the SDS not only reduces the time to deploy but also the effort involved such as approvals and handovers.
An example of the above was demonstrated by an oncology focused software maker. The software was originally installed as a single instance multi-tenant application in the cloud. It was subsequently moved to PaaS. The PaaS platform provided backend functions such as provisioning, deployment and security The separation of functionalities helped the oncology software maker to focus on application development and reduced schedule while the PaaS platform helped it grow.
This is true for organizations of any size. Even eBay and Paypal with its millions of users have found this strategy useful. As infrastructure and IT footprint grows, such automation improves agility.
#codingexercise

int GetCountSquareSubMatrixSizekCountZerosOrOnes(int[,] A, int rows, int cols, int k, int binary)

{

int total = 0;

for ( int I = 0; I  < rows; i++) {

    for ( int j = 0; j < cols; j++) {

            // use this as the start of submatrix

            int count = 0;

            for ( int x = i; x < k; x++)

                for ( int y = j; y < k; y++)

                       If  ( x < rows && y < cols && A[x,y] == binary)

count += 1;

If (count == k * k)
total += 1;

}

return total;

}

Monday, June 18, 2018

We were discussing virtualization and software defined stacks. Software defined technology stack aims to virtualize compute, network, storage and security aspects. One of the advantages of software defined services is that it can be repeated over and over again in different underlying layers with no change or impact to existing workloads. Another benefit of software defined services is that reconfiguration is super easy By changing the settings we can use the same software stack to behave differently. Server utilization and capacity is also improved. There energy footprint of the data center is also reduced. The automations possible with the SDS not only reduces the time to deploy but also the effort involved such as approvals and handovers. That said security and compliance needs to be studied with SDS deployments because they open up immense possibilities that go against hardening.
SDS can also move up the stack. This layer does not have to adhere to the hardware and can move up into applications and business operations. As an enabler for runtime it can host one or more workloads depending on what it is used for.

#codingexercise

int GetCountSquareSubMatrixSizekCountZeros (int[,] A, int rows, int cols, int k)

{

int total = 0;

for ( int I = 0; I  < rows; i++) {

    for ( int j = 0; j < cols; j++) {

            // use this as the start of submatrix

            int count = 0;

            for ( int x = i; x < k; x++)

                for ( int y = j; y < k; y++)

                       If  ( x < rows && y < cols && A[x,y] == 0)

count += 1;

If (count == k * k)
total += 1;

}

return total;

}