Cluster computing

Monday, February 4, 2019

Today we continue discussing the best practice from storage engineering :

409) Object storage provides local real-time writes while supporting a read dominated workload. More geographical distribution and horizontal scalability helps improve performance
410) The objects can live in the cache as well. A search engine can provide search over the catalog. Functional data access can be provided by the API. The API and the engine can separately cover all operations on the catalog.
411) Almost every NoSQL database has a comparison between its product and its competitors. The differences enumerated between these products while retaining the similarities shows how a product design affects its position in the Gartner magic quadrant. Most of them start with a simple idea of emphasizing some design choice over other.
412) Object storage has similar competitors differing mainly in their support of distributed and cluster file systems. Object storage is not merely an S3 façade storage. It brings durability, availability and content distribution to the storage while enabling multi-protocol access with or without file-system being enabled.
413) The catalog can be organized into Item, Variant, Price, Hierarchy, Facet and Vendors in the object store. The applications can search for data via prejoined objects either in cache or in store via indexing through search engine using Lucene/Solr architecture
414) All Catalog entities such as Item, Variant, Price, Hierarchy, Facet and Vendors can be represented as key-value collections or document models.
415) The unit of storage in non-relational stores is the key-value collection and each row can have different number of columns from a column family. The option for performance and scalability has been to use sharding and partitions.

Sunday, February 3, 2019

Today we continue discussing the best practice from storage engineering:

406) Catalogs support data modeling, data synchronization, data standardization, and flexible workflows. There are layer of information management using catalog starting with print/translation workflows at the bottom layer, followed by workflow or security for access to the assets, their editing, insertions and bulk insertions, followed by integrations or portals followed by flexible integration capabilities, full/data exports, multiple exports, followed by integration portals for integration with imports/exports, data pools and platforms, followed by digital asset management layer for asset on-boarding and delivery to channels, and lastly data management for searches, saved searches, channel based content, or localized content and the ability to author variants, categories, attributes and relationships to stored assets.

407) Relational databases suffer from limitations for supporting catalog workflows due to the following:
The field inventory is a local view only until it makes its way to the central store.
The relational store involves a one-a-day sync or something periodic
Stale views are served until the refresh happens which is often not fast enough for consumers.
The stale view interferes with analytics and aggregations reports.
Downstream internal and external apps have to work around the delays and stale views with sub-optimal logic.

408) The purpose of the catalog is to form a single view of the product with one central service, flexible schema, high read volume, write spike tolerant during catalog update, and to have advanced indexing and querying and geographical distribution for HA and low latency.

409) Object storage provides local real-time writes while supporting a read dominated workload. More geographical distribution and horizontal scalability helps improve performance

410) The objects can live in the cache as well. A search engine can provide search over the catalog. Functional data access can be provided by the API. The API and the engine can separately cover all operations on the catalog.

Saturday, February 2, 2019

Today we continue discussing the best practice from storage engineering:

401) Catalog can be maintained as a one stop shop in a store. There does not need to be sub-catalogs or fragmentation or ETL or MessageBus

402) Catalogs can remain equally available to Application servers, API data and services and webservers.

403) Catalogs can also be made available behind the store for supply chain management and data warehouse analytics

404) Catalogs can be made available for browsing as well as searching via such facilitators as Lucene search index

405) Catalogs can support geo-sharding with persisted shard ids or more granular store ids for improving high availability

406) Catalogs support data modeling, data synchronization, data standardization, and flexible workflows. There are layer of information management using catalog starting with print/translation workflows at the bottom layer, followed by workflow or security for access to the assets, their editing, insertions and bulk insertions, followed by integrations or portals followed by flexible integration capabilities, full/data exports, multiple exports, followed by integration portals for integration with imports/exports, data pools and platforms, followed by digital asset management layer for asset on-boarding and delivery to channels, and lastly data management for searches, saved searches, channel based content, or localized content and the ability to author variants, categories, attributes and relationships to stored assets.

Friday, February 1, 2019

Today we continue discussing the best practice from storage engineering:

A strategy for a game
Consider a row of n coins of values v1 . . . vn, where n is even. We play a game against an opponent by taking turns. In each turn, a player selects either the first or last coin from the row, removes it from the row permanently, and receives the value of the coin. Determine the maximum possible amount of money we can definitely win if we move first.
Let us now take an example.
For a sequence of 8, 15, 3, 7
we know that the maximum value can be 15 + 7
but the players cannot always be greedy. For example, if player one chooses 8 then opponent chooses 15, player one chooses 7 and the opponent chooses 3. Then there are no players left with the maximum. Instead, let us now consider a strategy where the players minimize the profit for the other where we make a choice which leads to progressively lower total for the opponent. When we take a smaller part of the coin sequence, we have the entirely same problem but on a smaller scale. Let us denote the solution for this subproblem with a function F. Then we can lay out the coins from position i to position j. Now a player going first can collect either
Then we make a recursive solution as maximum of the two choices
F(i,j) = max(Vi + min(F(i+1, j)) ,
Vj + min(F(i, j-1))
At the end of both player turns each, two coins have been eliminated with the globally poor choice going to the opponent and the relatively better choice being retained with us. Since the poor and the better are mutually exclusive and there is incremental progression towards the termination, we can also rewrite the recursion in only our own turns to be politically correct:
F(I,j) = max( Vi + max(F(i+2,j), F(i+1, j-1)),
Vj + max(F(i+1, j-1), F(I, j-2)))
The function terminates when there are only two coins left from the entire even set.
Int GetCoins(List<int> coins, int i, int j)
{
int n = coins.count;
If (i >= j) return 0;
If (i==j) return coins[i];
3 If (j == i+1) return max(coins[i], coins[j]);
Return max(coins[i] + max(GetCoins(coins, i+2, j), GetCoins(coins, i+1,j-1)) ,
coins[j] + max(GetCoins(coins,i+1, j-1), GetCoins(coins,i,j-2)));
}
Taking our example of 8, 15, 3, 7
we now have max(8 + outcome of (15,3,7) or 7 + outcome of (8,15, 3)) then we have an outcome.
Similarly, for (15,3,7) we have max (15 +outcome of (3,7) or 7 + outcome of (15,3))
Similarly, for (8,15,3) we have max (8 + outcome of (15,3) or 3 + outcome of (8,15))
We could also use a table to keep track of the choices and progression made.
int GetBest(List<int> coins)
{
int n = coins.count;
int table[n, n];
int i, j, k;
for ( k = 0; k < n; k++)
{
for (i = 0; j = k; j < n; i++; j++)
{
int x = ((i+2) <= j) ? table[i+2, j] : 0;
int y = ((i+1) <= j-1) ? table[i+1, j-1] : 0;
int z = (i <= (j-2)) ? table[i, j-2]: 0;
table[i,j] = max(coins[i] + max(x,y), coins[j]+max(y,z));
}
}

return table[0, n-1];
}

This problem can be modified to picking two coins at the same time
In such case the choices to pick the coins are
1) Two from left
2) Two from right
3) One from left and one from right
The first two cases degenerate to picking a coin with a combined value of the two coins.
The last case merely reduces the size of the original sequence:
Therefore this can be elaborated as :
Int GetCoins(List<int> coins, int i, int j)
{
int n = coins.count;
If (i >= j) return 0;
If (i==j) return coins[i];
If (j == i+1) return sum(coins[i], coins[j]);
Var option1 = coins[i] + coins [i+1] + max ( GetCoins(coins, i+4, j),
GetCoins(coins, i+2, j-2), GetCoins(coins, i+3, j-1));
Var option2 = max ( GetCoins(coins, i+2, j),
GetCoins(coins, i, j-4), GetCoins(coins, i+1, j-3)) + coins [j-1] + coins [j-2];
Var option3 = coins [i]+ max ( GetCoins(coins, i+3, j),
GetCoins(coins, i+1, j-3), GetCoins(coins, i+2, j-2)) + coins [j];
Return max(option1, option2, option3);
}
Taking our example of 8, 15, 3, 7,
The choices are 23 + 10, 10 +23, and 15+18

Thursday, January 31, 2019

Today we continue discussing the best practice from storage engineering:

395) Catalogs also need to be served to a variety of devices. Websites tailored for mobile and desktop differ even in the content that is presented and not just the style, markup, script or logic. There is virtually no restriction to how much resource can be stored in the object storage and these can co-exist.

396) Similar to catalogs but in the form of document collections, libraries of digital content are just as easy to collect in organizations as any other repository. Most of these document libraries are using relational databases but they have no difference from object storage in terms of the use of the content and since versioning is supported.

397) These libraries differ from the catalogs in that they not only read-only traffic but also read-write on the documents in the collection. It is also internal to the organization as opposed to public catalogs

398) These libraries also participate in a variety of workflows which were earlier subject to limitations of the service as well as the portal where users sign in to access their documents. The use of an object storage on the other hand removes such restrictions

399) Unlike catalogs, libraries have to provide significant resource access control. Object storage with its S3 api is suitable for this purpose.

400) Unlike catalogs libraries don’t need to be served to multiple devices. However, libraries tend to grow in number. Therefore, object storage can encompass them all and provide limitless storage.

Wednesday, January 30, 2019

Today we continue discussing the best practice from storage engineering:

391) Some companies in the retail industry have a lot of catalogs. Although there is significant investment in Master Data management, solutions similar to those can be built on top of object storage. This is definitely a niche space and one that can support an emerging trend.

392) These retail companies process significant read-only traffic for their catalogs with the help of http proxies and web services. The investment can be maintained the same so long as the read only operations on the backend translate to fetching objects from the object store. This can help ease the transition to directly serving it from the object storage.

393) Catalogs participate in a variety of workflows such as rewards service, promotions and campaigns and so on. When the catalogs are served from the service, they are subject to the limitations of the service. When the catalog is directly served from the object storage, then it becomes far easier to start new services.

394) Catalogs typically require no access controls since they are served to the public. This makes it more appealing to move it to object storage where content distribution, replication and multi-site support is available out of the box.

395) Catalogs also need to be served to a variety of devices. Websites tailored for mobile and desktop differ even in the content that is presented and not just the style, markup, script or logic. There is virtually no restriction to how much resource can be stored in the object storage and these can co-exist.

Tuesday, January 29, 2019

Today we continue discussing the best practice from storage engineering:

383) Health data is not just sharded by customer but also maintained in isolated shared-nothing pockets with their own management systems. Integration of data to represent a whole for the same customer is the new and emerging trend in the health industry. Organizations and companies are looking to converge the data for an individual without losing privacy or failing to comply with government regulations.

384) Health data has numerous file types for the data captured from the patients. These could range from small text documents to large images. Unlike cluster file systems that consolidate data to a cluster, these data artifacts are scattered throughout repositories. In addition, there is a lot of logic to who can access what data leading to some bulky user interface for read and edit by providers, insurance, administrators and end users.

385) In addition to access over health data, agencies and providers frequently exchange health records which leads to a high traffic of data from all the data sources. Virtually no data is erased from the system and historical records going back several years are maintained. The accumulation of data records also has no chance to go to a warehouse because it is always active and online.

386) Retail industry has traditionally embraced mammoth sized databases and some even on large Storage Area Networks. Their embrace of databases and data warehouses have been banner use cases for online transaction processing and online analytical processing. Yet vectorized execution models are gaining ground in nascent retail companies where they want to wrap all purchases, rental payments and servicing fees as billing events that flow to processors. It is highly unlikely that they will switch to management and analytics solutions overnight that are based on key value stores or object stores.

387) Unlike health industry data stores, Retail industry data stores are all self contained homogenous and full service management systems. Writing a new service for retail industry merely points to other existing services as data stores or shared databases. Even store front devices such as point of sale registers point to queues which inevitably process their messages from back-end databases

388) While these industries may view data stores, queues, services and management systems as data sources, they did not have the opportunity until recently to consolidate their data sources with storage first design.

#codingexercise

maximize the coin collection when two players take turns when picking out the coins from either end

Int GetCoins(List<int> coins, int i, int j)
{
int n = coins.count;
If (i >= j) return 0;
If (i==j) return coins[i];

If (j == i+1) return max(coins[i], coins[j]);
Return max(coins[i] + max(GetCoins(coins, i+2, j), GetCoins(coins, i+1,j-1)) ,
coins[j] + max(GetCoins(coins,i+1, j-1), GetCoins(coins,i,j-2)));
}