Friday, February 15, 2019

Today we continue discussing the best practice from storage engineering:

460) The use of injectors, proxies, man-in-the-middle test networking aspect but storage is more concerned with temporary and permanent outages, specific numbers associated with minimum and maximum limits and inefficiencies when the limits are exceeded.

461) Most storage products have a networking aspect. Testing covers networking separately from the others. This means timeouts, downtime, resolutions and traversals up and down the networking layers on a host. It also includes location information.

462) The control and data path traces and execution cycles statistics are critical to capture and query with a tool so that the testing can determine if the compute is at fault. Most such tools provide data over the http.

463) Responsiveness and accuracy are verified not only with repetitions but also from validating against different sources of truth. The same is true for logs and read-only data

464) When data is abundant, reporting improves the interpretations. Most reports are well-structured beforehand and even used with templates for different representations.

465) Testing provides the added advantage of sending reports by mail for scheduled runs. These help human review of the bar for quality

#codingexercise
int binary_search(String input, int start, int end, char val)
{
int mid = (start + end)/2;
if (input[mid] == val) return mid;
if (start == end && input[mid] != val) return -1;
if (input[mid] < val)
return binary_search(nums, mid+1, end, val);
else
return binary_search(nums, start, mid, val);

}


Thursday, February 14, 2019

Today we continue discussing the best practice from storage engineering:

455) The use of a virtual machine image as a storage artifact only  highlights the use of large files in storage. They are usually saved on the datastore in the datacenter but nothing prevents the end user owning he machine take periodic backups of the vm image with tools like duplicity. These files can then be stashed in storage products like object storage. The ability of S3 to take on multi-part upload eases the use of large files.

456) The use of large files helps test most bookkeeping associated with the logic that depends on the size of the storage  artifact. While performance optimizations remove redundant operations in different layers to streamline a use case, the unoptimized code path is better tested with large files.

457) In the next few sections, we cover some of the testing associated with a storage product. The use of large number of small data files and a small number of large data files serves the most common case of data ingested by a storage product. However, duplicates, order and attributes also matter. Latency and throughput are also measured with their data transfer.

458) Cluster based topology testing differs significantly from peer-to-peer networking-based topology testing. One represents the capability and the other represents distribution. The tests have to articulate different loads for each.

459) The testing of software layers is achieved with simulation of lower layers. However, integration testing is closer to real life scenarios. Specifically, the testing of data corruption, unavailability or loss is critical to the storage product

460) The use of injectors, proxies, man-in-the-middle test networking aspect but storage is more concerned with temporary and permanent outages, specific numbers associated with minimum and maximum limits and inefficiencies when the limits are exceeded.

#algorithm
MST-Prim
// this grows a tree
A = null
for each vertex v in G, initialize the key and the parent
Initialize a min-priority queue Q with vertices
while the Queue is not empty
       extract the vertex with the minimum edge distance connecting it to the tree
       for each adjacencies v of this vertex u
              set the key to the weight(u,v) and parent


Print Fibonacci using tail recursion:
uint GetTailRecursiveFibonacci(uint n, uint a = 0, uint b = 1)
{
    if (n == 0)
        return a;
    if (n == 1)
        return b;
    return GetTailRecursiveFibonacci(n-1, b, a+b);
}


Wednesday, February 13, 2019

Today we continue discussing the best practice from storage engineering:

453) While container platforms for Platform-as-a-service (PaaS) have enabled software to be deployed without any recognition of the host and frequently rotated from one host to another, the end users adoption of PaaS platform depend on the production readiness of the applications and services The force for PaaS adoption has made little or no changes to the use and proliferation of virtual machines by individual users

454)  The cloud services provider can package services such as additional storage, regular backup schedule, patching schedule, system management, securing and billing at the time of request for each asset. However such services depend on the cloud where the services are requested. For private cloud a lot of the service is in-house adding to the costs even if the inventory is free.

455) The use of a virtual machine image as a storage artifact only  highlights the use of large files in storage. They are usually saved on the datastore in the datacenter but nothing prevents the end user owning he machine take periodic backups of the vm image with tools like duplicity. These files can then be stashed in storage products like object storage. The ability of S3 to take on multi-part upload eases the use of large files.

456) The use of large files helps test most bookkeeping associated with the logic that depends on the size of the storage  artifact. While performance optimizations remove redundant operations in different layers to streamline a use case, the unoptimized code path is better tested with large files.


#codingexercise
int GetCount(uint n)
 {
 if ( n == 0) return 0;
 if (n == 1) return 1;
 if (n == 2) return 2;
 return GetCount(n-1)+GetCount(n-2);
 }

Tuesday, February 12, 2019

Today we continue discussing the best practice from storage engineering:

451) Many organizations use one or more public clouds to meet the demand for the compute resource by its employees. A large number of these requests are fine grained where customers request a handful of virtual machines for their private use. Usually not more than twenty percent of the customers have demands that are very large ranging to about a hundred or more virtual machines.

452) The virtual machines for the individual customers are sticky. Customers don’t usually release their resource and even identify it by their name or ip address for their day to day work.  They host applications, services and automations on their virtual machines and often cannot let go of their virtual machine unless files and programs have a migration path to another compute resource. Typically they do not take this step to create regular backups and keep moving the resource.

453) While container platforms for Platform-as-a-service (PaaS) have enabled software to be deployed without any recognition of the host and frequently rotated from one host to another, the end users adoption of PaaS platform depend on the production readiness of the applications and services The force for PaaS adoption has made little or no changes to the use and proliferation of virtual machines by individual users

454) The cloud services provider can package services such as additional storage, regular backup schedule, patching schedule, system management, securing and billing at the time of request for each asset. However such services depend on the cloud where the services are requested. For private cloud a lot of the service is in-house adding to the costs even if the inventory is free.

Monday, February 11, 2019

Today we continue discussing the best practice from storage engineering:

449) Object storage can serve as the storage for graph databases and object databases. Object storage then transforms from being a passive storage layer to one that actively builds metadata, maintains organizations and rebuilds indexes from object rather than from files.

450) File-systems have long been the destination to store artifacts on disk and while file-system has evolved to stretch over clusters and not just remote servers, it remains inadequate as a blob storage. Data writers have to self-organize and interpret their files while frequently relying on the metadata stored separate from the files

451) Files also tend to become binaries with proprietary interpretations. Files can only be bundled in an archive and there is no object-oriented design over data. If the storage were to support organizational units in terms of objects without requiring hierarchical declarations and supporting is-a or has-a relationships, it tends to become more usable than files. This modular storage  enhances the use of object storage and does not compete with the usages of elastic file stores.

452) Such object-storage will find a niche usage in spatial databases, telecommunications and scientific computing requiring large scale use of elementary organizational units that are not necessarily related.  For example, spatial databases make use of polygons as a unit of organization and store large amounts of polygons

453) Traditional relational databases have long cherished an acceptance for storing data that requires interpretations. However, the chores associated with converting data to structured form and amenable to querying can be relaxed with native support for rich non-hierarchical data organization from storage layer and transformation to a different class of unstructured storage.

int GetCoinsDP(List<int> coins, int i, int j)
{
If (i > j) return 0;
If (i==j) return coins[i];
If (j == i+1) return max (coins [i], coins [j]);
return max (
coins[I] + GetCoinsDP(sequence, 1, j),
coins[j] + GetCoinsDP (sequence, 0,j-1));
}
The selections in each initiation level of GetCoinsDP can be added to a list and alternate additions can be skipped as belonging to the other player since the method remains the same for both.


Sunday, February 10, 2019

Today we continue discussing the best practice from storage engineering:

443) Sections of the file can be locked for multi-process access and even to map sections of a file on virtual memory systems. The latter is called memory mapping and it enables multiple processes to share the data. Each sharing process' virtual memory map points to the same page of physical memory - the page that holds a copy of the disk block.

444) File Structure is dependent on the file types.  Internal file structure is operating system dependent. Disk access is done in units of block. Since logical records vary in size, several of them are packed in single physical block as for example at byte size. The logical record size, the physical block size and the packing technique determine how many logical records are in each physical block. There are three major methods of allocation methods: contiguous, linked and indexed. Internal fragmentation is a common occurrence from the wasted bytes in block size.

445) Access methods are either sequential or direct. The block number is relative to the beginning of the file. The use of relative block number helps the program to determine where the file should be placed and helps to prevent the users from accessing portions of the file system that may not be part of his file.

446) File system is broken into partitions. Each disk on the system contains at least one partition. Partitions are like separate devices or virtual disks. Each partition contains information about files within it and is referred to as the directory structure. The directory can be viewed as a symbol table that translates file names into their directory entries. Directories are represented as a tree structure. Each user has a current directory. A tree structure prohibits the sharing of a files or directories. An acyclic graph allows directories to have shared sub-directories and files. Sharing means there's one actual file and changes made by one user are visible to the other. Shared files can be implemented via a symbolic link which is resolved via the path name. Garbage collection may be necessary to avoid cycles.

447) Protection involves access lists and groups. Consistency is maintained via open and close operation wrapping.

448) File system is layered in the following manner:
1) application programs, 2) logical file system, 3) file-organization module, 4) basic file system, 4) i/o control  and 5) devices. The last layer is the hardware. The i/o control is the consists of device drivers and interrupt handlers, the basic file system issues generic commands to the appropriate device driver. The file organization module knows about files and their logical blocks. The logical file system uses the directory structure to inform the file organization module. The application program is responsible for creating and deleting files. 

#codingexercise: 
a player can draw a coin from the ends of a sequence. Determine the winning strategy:
int GetCoinsDP(List<int> coins, int i, int j)
{
If (i > j) return 0;
If (i==j) return coins[i];
If (j == i+1) return max (coins [i], coins [j]);
return max (
coins[I] + GetCoinsDP(sequence, 1, j), 
coins[j] + GetCoinsDP (sequence, 0,j-1));
}
The selections in each initiation level of GetCoinsDP can be added to a list and alternate additions can be skipped as belonging to the other player since the method remains the same for both.



# potential trend with object storage: 
https://1drv.ms/w/s!Ashlm-Nw-wnWuQB7rNqURAxQf9hF

Saturday, February 9, 2019

Today we continue discussing the best practice from storage engineering :

441) File-Systems continue to be a good source of organizational information on storage systems. File attributes include name, type, location, size, protection, and time, date and user identification. Operations supported are creating a file, writing a file, reading a file, repositioning within a file, deleting a file, and truncating a file.

442) Data structures include two levels of internal tables: there is a per process table of all the files that each process has opened. This points to the location inside a file where data is to be read or written. This table is arranged by the file handles and has the name, permissions, access dates and pointer to disk block. The other table is a system wide table with open count, file pointer, and disk location of the file.

443) Sections of the file can be locked for multi-process access and even to map sections of a file on virtual memory systems. The latter is called memory mapping and it enables multiple processes to share the data. Each sharing process' virtual memory map points to the same page of physical memory - the page that holds a copy of the disk block.

444) File Structure is dependent on the file types.  Internal file structure is operating system dependent. Disk access is done in units of block. Since logical records vary in size, several of them are packed in single physical block as for example at byte size. The logical record size, the physical block size and the packing technique determine how many logical records are in each physical block. There are three major methods of allocation methods: contiguous, linked and indexed. Internal fragmentation is a common occurrence from the wasted bytes in block size.

445) Access methods are either sequential or direct. The block number is relative to the beginning of the file. The use of relative block number helps the program to determine where the file should be placed and helps to prevent the users from accessing portions of the file system that may not be part of his file.

#codingexercise:
In a game of drawing two coins from either ends of a sequence between two players, determine the strategy to win:
Int GetCoins2DP(List<int> coins, int i, int j) { If (i > j) return 0; If (i==j) return coins[i]; 
If (j == i+1) return sum (coins [i], coins [j]); 
return max ( 
coins[I] + coins[I+1] + GetCoins2DP(sequence, 2, sequence.size() - 1),  
coins[j] + coins[j-1] + GetCoins2DP(sequence, 0, sequence.size() - 3),  
coins[I] + coins[j] + GetCoins2DP(sequence, 1,sequence.size()-2)); 
} 
The selections in each initiation level of GetCoins2DP can be added to a list and alternate additions to this list can be skipped as belonging to the other player since the method remains the same for both.  Then it helps to determine whether to go first or second.