Monday, February 18, 2019

Today we continue discussing the best practice from storage engineering:


473) Storage products are also prone to increasing their test matrix with new devices such as solid state drive and emerging trends such as IoT

474) Storage products have to be limitless for their customers but they cannot say how they will be used. They will frequently run into usages where customers use them inappropriately and go against their internal limits such as the number of policies that can be applied to their organizational units.

475) There was a time when content addressable storage was popular. It involved generating a PEA file to save contents that could be looked up by their hash. The use of object storage made it easier to access the objects directly.

476) Data is increasingly being produced as fixed content Emails and faxes are examples of these.  The lifecycle of content such as from system, personal computing, Network centric and content centric are progressively higher and higher in their durations

477) Drop and create of user artifacts helps the user to cleanup. This is not the case for say system catalog. Still the storage artifacts used on behalf of the user is also the same as the storage artifacts used for system itself. Creating and dropping such artifacts would be helpful even if they are internal.

478) The retention policy is typically 6 months for email,  3 years for financial data, 5 years for legal. The retention period for object storage is user defined.

479) Object Storage is touted as best for static content. Data that changes often is then said to be preferred in NoSQL or other unstructured storage. With object versioning, API and SDK, this is no longer the case.

480) Data Transfers have never been considered a virtual storage since they belong to the source. Data in transit can live in queues, cache and object storage which is good for vectorized execution .

Sunday, February 17, 2019

Today we continue discussing the best practice from storage engineering :

469) Storage products are tested for size and usage under all circumstances. These can be fine grained or aggregated and can be queried at different scopes and levels.

470) Storage product is generally the source of truth for all upstream data sources and workflows.

471) Storage products manage to be the source of truth even with different consistency models. They just need to meet their usages.

472) Storage products have evolved from being purely disk based solutions to compute and network embracing software defined stacks. Consequently they are better able to be tested but the complexity  increases.

473) Storage products are also prone to increasing their test matrix with new devices such as solid state drive and emerging trends such as IoT

474) Storage products have to be limitless for their customers but they cannot say how they will be used. They will frequently run into usages where customers use them inappropriately and go against their internal limits such as the number of policies that can be applied to their organizational units.


#codingexercise
Friends Pairing problem:
Given n friends, each one can remain single or can be paired up with some other friend. Each friend can be paired only once so ordering is irrelevant
The total number of ways in which the friends can be paired is given by;
Int GetPairs(int n)
{
If (n <=2) return n;
Return GetPairs(n-1) + GetPairs(n-2)*(n-1);
}

Saturday, February 16, 2019

Today we continue discussing the best practice from storage engineering :

466) Storage products have multiple paths of data entry usually via protocols. These are each tested using their respective protocol tools

467) Storage products are usually part of tiered storage. As such data aging and validation needs to be covered

468) Storage products are tested with different batches of loads. They are also tested using continuous loads with varying rate over time

469) Storage products are tested for size and usage under all circumstances. These can be fine grained or aggregated and can be queried at different scopes and levels.

470) Storage product is generally the source of truth for all upstream data sources and workflows.

#codingexercise
The coin selection problem can scale to any constant number of coins that can be picked in each turn using the methods below.
int GetCoinsKWithDP(List<int> coins, int i, int j, int k)
{
if (i > j) return 0;
if (i==j) return coins[i];
if (j - i +1 <= k) {
     List <int>  change = new ArrayList <int>();
    for (int c = i ; c <= j; c++) {
           change.add (coins [c]) ;
    }
    return Collections.sum (change);
}
List <int> options = new ArrayList ();
for (int m = 0; m < k; m++) {
      int change = 0;

      for (int left = 0; left < k; left++) {
             change += coins [i+left];   
             int option = change + GetCoinsDP (sequence, i+left, j);
     options.add (option);
      }

      for (int right = 0; right < k; right++) {
             change += coins [j-right];
int option = change + GetCoinsDP (sequence, i, j-right);
     options.add (option);
      }
   
        for (int left = 0; left < k; left++) {
             for (int right = 0; right < k-left right++) {
             change += coins [j-right];
int option = change + GetCoinsDP (sequence, i+left, j-right);
     options.add (option);
      }
}
}
return Collections.max (options);
}
The selections in each initiation level of this GetCoinsKWithDP can be added to a list and alternate additions can be skipped as belonging to the other player since the method remains the same for both.

Friday, February 15, 2019

Today we continue discussing the best practice from storage engineering:

460) The use of injectors, proxies, man-in-the-middle test networking aspect but storage is more concerned with temporary and permanent outages, specific numbers associated with minimum and maximum limits and inefficiencies when the limits are exceeded.

461) Most storage products have a networking aspect. Testing covers networking separately from the others. This means timeouts, downtime, resolutions and traversals up and down the networking layers on a host. It also includes location information.

462) The control and data path traces and execution cycles statistics are critical to capture and query with a tool so that the testing can determine if the compute is at fault. Most such tools provide data over the http.

463) Responsiveness and accuracy are verified not only with repetitions but also from validating against different sources of truth. The same is true for logs and read-only data

464) When data is abundant, reporting improves the interpretations. Most reports are well-structured beforehand and even used with templates for different representations.

465) Testing provides the added advantage of sending reports by mail for scheduled runs. These help human review of the bar for quality

#codingexercise
int binary_search(String input, int start, int end, char val)
{
int mid = (start + end)/2;
if (input[mid] == val) return mid;
if (start == end && input[mid] != val) return -1;
if (input[mid] < val)
return binary_search(nums, mid+1, end, val);
else
return binary_search(nums, start, mid, val);

}


Thursday, February 14, 2019

Today we continue discussing the best practice from storage engineering:

455) The use of a virtual machine image as a storage artifact only  highlights the use of large files in storage. They are usually saved on the datastore in the datacenter but nothing prevents the end user owning he machine take periodic backups of the vm image with tools like duplicity. These files can then be stashed in storage products like object storage. The ability of S3 to take on multi-part upload eases the use of large files.

456) The use of large files helps test most bookkeeping associated with the logic that depends on the size of the storage  artifact. While performance optimizations remove redundant operations in different layers to streamline a use case, the unoptimized code path is better tested with large files.

457) In the next few sections, we cover some of the testing associated with a storage product. The use of large number of small data files and a small number of large data files serves the most common case of data ingested by a storage product. However, duplicates, order and attributes also matter. Latency and throughput are also measured with their data transfer.

458) Cluster based topology testing differs significantly from peer-to-peer networking-based topology testing. One represents the capability and the other represents distribution. The tests have to articulate different loads for each.

459) The testing of software layers is achieved with simulation of lower layers. However, integration testing is closer to real life scenarios. Specifically, the testing of data corruption, unavailability or loss is critical to the storage product

460) The use of injectors, proxies, man-in-the-middle test networking aspect but storage is more concerned with temporary and permanent outages, specific numbers associated with minimum and maximum limits and inefficiencies when the limits are exceeded.

#algorithm
MST-Prim
// this grows a tree
A = null
for each vertex v in G, initialize the key and the parent
Initialize a min-priority queue Q with vertices
while the Queue is not empty
       extract the vertex with the minimum edge distance connecting it to the tree
       for each adjacencies v of this vertex u
              set the key to the weight(u,v) and parent


Print Fibonacci using tail recursion:
uint GetTailRecursiveFibonacci(uint n, uint a = 0, uint b = 1)
{
    if (n == 0)
        return a;
    if (n == 1)
        return b;
    return GetTailRecursiveFibonacci(n-1, b, a+b);
}


Wednesday, February 13, 2019

Today we continue discussing the best practice from storage engineering:

453) While container platforms for Platform-as-a-service (PaaS) have enabled software to be deployed without any recognition of the host and frequently rotated from one host to another, the end users adoption of PaaS platform depend on the production readiness of the applications and services The force for PaaS adoption has made little or no changes to the use and proliferation of virtual machines by individual users

454)  The cloud services provider can package services such as additional storage, regular backup schedule, patching schedule, system management, securing and billing at the time of request for each asset. However such services depend on the cloud where the services are requested. For private cloud a lot of the service is in-house adding to the costs even if the inventory is free.

455) The use of a virtual machine image as a storage artifact only  highlights the use of large files in storage. They are usually saved on the datastore in the datacenter but nothing prevents the end user owning he machine take periodic backups of the vm image with tools like duplicity. These files can then be stashed in storage products like object storage. The ability of S3 to take on multi-part upload eases the use of large files.

456) The use of large files helps test most bookkeeping associated with the logic that depends on the size of the storage  artifact. While performance optimizations remove redundant operations in different layers to streamline a use case, the unoptimized code path is better tested with large files.


#codingexercise
int GetCount(uint n)
 {
 if ( n == 0) return 0;
 if (n == 1) return 1;
 if (n == 2) return 2;
 return GetCount(n-1)+GetCount(n-2);
 }

Tuesday, February 12, 2019

Today we continue discussing the best practice from storage engineering:

451) Many organizations use one or more public clouds to meet the demand for the compute resource by its employees. A large number of these requests are fine grained where customers request a handful of virtual machines for their private use. Usually not more than twenty percent of the customers have demands that are very large ranging to about a hundred or more virtual machines.

452) The virtual machines for the individual customers are sticky. Customers don’t usually release their resource and even identify it by their name or ip address for their day to day work.  They host applications, services and automations on their virtual machines and often cannot let go of their virtual machine unless files and programs have a migration path to another compute resource. Typically they do not take this step to create regular backups and keep moving the resource.

453) While container platforms for Platform-as-a-service (PaaS) have enabled software to be deployed without any recognition of the host and frequently rotated from one host to another, the end users adoption of PaaS platform depend on the production readiness of the applications and services The force for PaaS adoption has made little or no changes to the use and proliferation of virtual machines by individual users

454) The cloud services provider can package services such as additional storage, regular backup schedule, patching schedule, system management, securing and billing at the time of request for each asset. However such services depend on the cloud where the services are requested. For private cloud a lot of the service is in-house adding to the costs even if the inventory is free.