Cluster computing

Tuesday, February 12, 2019

Today we continue discussing the best practice from storage engineering:

451) Many organizations use one or more public clouds to meet the demand for the compute resource by its employees. A large number of these requests are fine grained where customers request a handful of virtual machines for their private use. Usually not more than twenty percent of the customers have demands that are very large ranging to about a hundred or more virtual machines.

452) The virtual machines for the individual customers are sticky. Customers don’t usually release their resource and even identify it by their name or ip address for their day to day work. They host applications, services and automations on their virtual machines and often cannot let go of their virtual machine unless files and programs have a migration path to another compute resource. Typically they do not take this step to create regular backups and keep moving the resource.

453) While container platforms for Platform-as-a-service (PaaS) have enabled software to be deployed without any recognition of the host and frequently rotated from one host to another, the end users adoption of PaaS platform depend on the production readiness of the applications and services The force for PaaS adoption has made little or no changes to the use and proliferation of virtual machines by individual users

454) The cloud services provider can package services such as additional storage, regular backup schedule, patching schedule, system management, securing and billing at the time of request for each asset. However such services depend on the cloud where the services are requested. For private cloud a lot of the service is in-house adding to the costs even if the inventory is free.

Monday, February 11, 2019

Today we continue discussing the best practice from storage engineering:

449) Object storage can serve as the storage for graph databases and object databases. Object storage then transforms from being a passive storage layer to one that actively builds metadata, maintains organizations and rebuilds indexes from object rather than from files.

450) File-systems have long been the destination to store artifacts on disk and while file-system has evolved to stretch over clusters and not just remote servers, it remains inadequate as a blob storage. Data writers have to self-organize and interpret their files while frequently relying on the metadata stored separate from the files

451) Files also tend to become binaries with proprietary interpretations. Files can only be bundled in an archive and there is no object-oriented design over data. If the storage were to support organizational units in terms of objects without requiring hierarchical declarations and supporting is-a or has-a relationships, it tends to become more usable than files. This modular storage enhances the use of object storage and does not compete with the usages of elastic file stores.

452) Such object-storage will find a niche usage in spatial databases, telecommunications and scientific computing requiring large scale use of elementary organizational units that are not necessarily related. For example, spatial databases make use of polygons as a unit of organization and store large amounts of polygons

453) Traditional relational databases have long cherished an acceptance for storing data that requires interpretations. However, the chores associated with converting data to structured form and amenable to querying can be relaxed with native support for rich non-hierarchical data organization from storage layer and transformation to a different class of unstructured storage.

int GetCoinsDP(List<int> coins, int i, int j)
{
If (i > j) return 0;
If (i==j) return coins[i];
If (j == i+1) return max (coins [i], coins [j]);
return max (
coins[I] + GetCoinsDP(sequence, 1, j),
coins[j] + GetCoinsDP (sequence, 0,j-1));
}
The selections in each initiation level of GetCoinsDP can be added to a list and alternate additions can be skipped as belonging to the other player since the method remains the same for both.

Sunday, February 10, 2019

Today we continue discussing the best practice from storage engineering:

443) Sections of the file can be locked for multi-process access and even to map sections of a file on virtual memory systems. The latter is called memory mapping and it enables multiple processes to share the data. Each sharing process' virtual memory map points to the same page of physical memory - the page that holds a copy of the disk block.

444) File Structure is dependent on the file types. Internal file structure is operating system dependent. Disk access is done in units of block. Since logical records vary in size, several of them are packed in single physical block as for example at byte size. The logical record size, the physical block size and the packing technique determine how many logical records are in each physical block. There are three major methods of allocation methods: contiguous, linked and indexed. Internal fragmentation is a common occurrence from the wasted bytes in block size.

445) Access methods are either sequential or direct. The block number is relative to the beginning of the file. The use of relative block number helps the program to determine where the file should be placed and helps to prevent the users from accessing portions of the file system that may not be part of his file.

446) File system is broken into partitions. Each disk on the system contains at least one partition. Partitions are like separate devices or virtual disks. Each partition contains information about files within it and is referred to as the directory structure. The directory can be viewed as a symbol table that translates file names into their directory entries. Directories are represented as a tree structure. Each user has a current directory. A tree structure prohibits the sharing of a files or directories. An acyclic graph allows directories to have shared sub-directories and files. Sharing means there's one actual file and changes made by one user are visible to the other. Shared files can be implemented via a symbolic link which is resolved via the path name. Garbage collection may be necessary to avoid cycles.

447) Protection involves access lists and groups. Consistency is maintained via open and close operation wrapping.

448) File system is layered in the following manner:
1) application programs, 2) logical file system, 3) file-organization module, 4) basic file system, 4) i/o control and 5) devices. The last layer is the hardware. The i/o control is the consists of device drivers and interrupt handlers, the basic file system issues generic commands to the appropriate device driver. The file organization module knows about files and their logical blocks. The logical file system uses the directory structure to inform the file organization module. The application program is responsible for creating and deleting files.

#codingexercise:

a player can draw a coin from the ends of a sequence. Determine the winning strategy:

int GetCoinsDP(List<int> coins, int i, int j)
{
If (i > j) return 0;
If (i==j) return coins[i];
If (j == i+1) return max (coins [i], coins [j]);
return max (
coins[I] + GetCoinsDP(sequence, 1, j),
coins[j] + GetCoinsDP (sequence, 0,j-1));
}
The selections in each initiation level of GetCoinsDP can be added to a list and alternate additions can be skipped as belonging to the other player since the method remains the same for both.

# potential trend with object storage:

https://1drv.ms/w/s!Ashlm-Nw-wnWuQB7rNqURAxQf9hF

Saturday, February 9, 2019

Today we continue discussing the best practice from storage engineering :

441) File-Systems continue to be a good source of organizational information on storage systems. File attributes include name, type, location, size, protection, and time, date and user identification. Operations supported are creating a file, writing a file, reading a file, repositioning within a file, deleting a file, and truncating a file.

442) Data structures include two levels of internal tables: there is a per process table of all the files that each process has opened. This points to the location inside a file where data is to be read or written. This table is arranged by the file handles and has the name, permissions, access dates and pointer to disk block. The other table is a system wide table with open count, file pointer, and disk location of the file.

443) Sections of the file can be locked for multi-process access and even to map sections of a file on virtual memory systems. The latter is called memory mapping and it enables multiple processes to share the data. Each sharing process' virtual memory map points to the same page of physical memory - the page that holds a copy of the disk block.

444) File Structure is dependent on the file types. Internal file structure is operating system dependent. Disk access is done in units of block. Since logical records vary in size, several of them are packed in single physical block as for example at byte size. The logical record size, the physical block size and the packing technique determine how many logical records are in each physical block. There are three major methods of allocation methods: contiguous, linked and indexed. Internal fragmentation is a common occurrence from the wasted bytes in block size.

445) Access methods are either sequential or direct. The block number is relative to the beginning of the file. The use of relative block number helps the program to determine where the file should be placed and helps to prevent the users from accessing portions of the file system that may not be part of his file.

#codingexercise:
In a game of drawing two coins from either ends of a sequence between two players, determine the strategy to win:

Int GetCoins2DP(List<int> coins, int i, int j) { If (i > j) return 0; If (i==j) return coins[i];

If (j == i+1) return sum (coins [i], coins [j]);

return max (

coins[I] + coins[I+1] + GetCoins2DP(sequence, 2, sequence.size() - 1),

coins[j] + coins[j-1] + GetCoins2DP(sequence, 0, sequence.size() - 3),

coins[I] + coins[j] + GetCoins2DP(sequence, 1,sequence.size()-2));

}

The selections in each initiation level of GetCoins2DP can be added to a list and alternate additions to this list can be skipped as belonging to the other player since the method remains the same for both. Then it helps to determine whether to go first or second.

Friday, February 8, 2019

Today we continue discussing the best practice from storage engineering:

436) Event monitoring software can accelerate software development and test cycles. Event monitoring data is usually machine data generated by the IT systems. Such data can enable real-time searches to gain insights into user experience. Dashboards with charts can then help analyze the data. This data can be accessed over TCP, UDP and HTTP. Data can also be warehoused for analysis. Issues that frequently recur can be documented and searched more quickly with the availability of such data leading to faster debugging and problem solving.

437) Data is available to be collected, indexed, searched and reported. Applications can target specific interests such as security or correlations for building rules and alerts. Data is also varied such as from network, from applications, and from enterprise infrastructure. Powerful querying increases the usability of such data.

438) Queries for such key valued data can be written using PIG commands such as load/read, store/write, foreach/iterate, filter/predicate, group-cogroup, collect, join, order, distinct, union, split, stream, dump and limit.

439) Some of the differentiators of such software include the ability to have one platform, fast return on investment, ability to use different data collectors, use non-traditional flat file data stores, ability to create and modify existing reports, ability to create baselines and study changes, programmability to retrieve information as appropriate and ability to include compliance, security, fraud detection etc

440) Early warning notifications, running rules engine, detecting trends are some of the features that enhance not only popular use cases by providing feedback of deployed software but also increase customer satisfaction as changes are incremental

Thursday, February 7, 2019

Today we continue discussing the best practice from storage engineering :

431) There are two flavors of the release consistency model - the serialization consistency and processor consistency flavors. All of the models in this group allow a processor to read its own write early. However, the two flavors are the only ones whose straightforward implementations allow a read to return the value of another processor's write early. These models distinguish memory operations based on their type and provide stricter ordering constraints for some type of operations.

432) The Weak ordering model classifies memory operations into two categories: data operations and synchronization operations. Since the programmer is required to identify at least one of the operations as a synchronization operation, the model can reorder memory operations between these synchronization operations without affecting the program correctness.

433) The other category of models for relaxing all program orders such as Alpha, RMO and PowerPC - all provide explicit fence instructions as their safety nets. The alpha model provides two different fence instructions: the memory barrier and the write memory barrier. The memory barrier (MB) instruction can be used to maintain program order from any memory operation before the MB to any memory instruction after the MB. The write memory barrier instruction provides this guarantee only among write operations.

434) The PowerPC model provides a single fence instruction: the SYNC instruction. This is similar to the memory barrier instruction with the exception that when there are two reads to the same location, one may return the value of an older write than the first read. This model therefore requires read-modify-write semantics to enforce program order.

435) A key goal of the programmer centric approach is to define the operations that should be distinguished as synchronization. In other words, a user's program consists of operations that are to be synchronized or otherwise categorized as data operations in an otherwise sequentially consistent program.

#codingexercise:
We were discussing the game of drawing coins from a sequence of coins to maximize our collection against an opponent:

And the implementation for GetCoins3 where we return the combined value of 3 or less coins can be as follows:
Int GetCoins3OneTimeDraw (List<int> coins, int i, int j)
{
int n = coins.count;
If (i >= j) return 0;
If (i==j) return coins[i];
If (j == i+1) return sum (coins [i], coins [j]);
If (j == i+2) return sum(coins[i], coins [i+1], coins[j]);
// using handpicking
Var option1 = max (
coins[I] + coins[I+1] + coins[I+2],
coins[I] + coins[I+1] + coins[j],
coins[I] + coins[j-1] + coins[j],
coins[j-2] + coins[j-1] + coins[j]);
// using GetCoins2
Var option2 = max (
coins[I] + GetCoins2(coins, i+1, j),
GetCoins2(coins, i, j-1) + coins[j]);

//using GetCoins
Var option3 = max (
Coins[I] + coins[I+1] + GetCoins(coins, I+2, j),
Coins[I] + GetCoins(coins, I+1, j-1) + coins[j],
GetCoins(coins, I, j-2) + coins[j-1] + coins[j]
);
Return max(option1, option2, option3);
}

Wednesday, February 6, 2019

Today we continue discussing the best practice from storage engineering:

425) Another factor to improve data residency has been compression but this has required representations that are amenable to data processing internals..

426) There can be conflicts during replication cycle. For example, server A creates an object with a particular name at roughly the same time that Server B creates an object. with the same name. The conflict reconciliation process kicks in at the next replication cycle. The server looks for the version numbers of the updates and whichever is higher wins the conflict. If the version numbers are same, whichever attribute was changed at a later time wins the conflict.

427) If an object is moved to a parent that is now deleted, that object is placed in the lost and found container.

428) The single-master replication has following drawbacks: it has a single point of failure, there's geographic distance from master to clients performing the updates, and less efficient replication due to single originating location of updates. With multi-master replication, these can be avoided but they must be made part of a topology and the way the masters replicate with each other must be defined.

429) Some replication techniques use background loading where the data is loaded offline before being made available online. This is especially useful when the process of loading can take a long time.

430) Availability of a service is improved by adding a cluster instead of a server. On the other hand, processes involved in background loading can use a primary server together with secondary servers. In such cases, a primary server is authoritative but a secondary server can serve the content when primary is unavailable.

#codingexercise
In a game of collecting coins of different value from a sequence, is it better to go first or second ?

int getPlayer(List<int> sequence) {

Int player1Collection = GetCoins(sequence, 0, sequence.size() - 1);

Int player2Collection = Math.max(GetCoins(sequence, 1, sequence.size() - 1), GetCoins(sequence, 0, sequence.size() - 2));

List<int> collections = Arrays.asList(player1Collection, player2Collection);

Int max = Collections.max(collections);

Return collections.indexOf(max);

}