Cluster computing

Friday, July 20, 2018

Let us discuss the design to serve videos from a video repository to customers:

We keep track of Users and Videos in our inventory. Videos may also have images associated such as thumbnails. It may not be surprising to find images far exceeding the number of videos. The videos could themselves number under a billion. Images and videos are best served from a file storage cluster such as Isilon OneFS cluster although performance may need to be compared with storing them on the public cloud.

User Model has information about users credentials as well as profile and statistics. These can be maintained in a relational database and can be served from the cloud with a managed service database. Users authorization may also flow via OAuth and therefore API services for user and video management may help. A typical User Interface, API services and Data stores will help with making the service available on a variety of devices.

The Videos are large in number. They will be organized in pool and repository. The pool videos will be the most watched or trending and will be available from a web server that implements an LRU like policy. Pool can also be maintained per user and aged. Users watched videos can also be eagerly loaded into user's pool on users page load. Dynamic queries will be honored with immediate fetches from disk while statistics are updated to enable them to be served again from cache. The storage of videos presents significant challenges. There can be an alternative to letting them be managed by a managed OneFS NAS Storage. In this case, we place them in data center storage separated by some partitioning mechanism of videos. Streaming from the data center storage will continue to happen with one or more web servers dedicated for such purpose. Streaming services can maintain their own cache and performance improvements.

CDN will be used wherever possible to improve geographically close network. It will serve static resources and alleviate the load on the web servers. Caching will be offered at every level and as much as possible so that the backend is not hit for all purposes.

Video replication and availability from multiple regions may be offered to improve their availability and reliability. Indexes may be maintained for all video metadata including authoring and uploading by owners. Index need not be in relational data. Such information or mapping along with statistics can also be in key-value stores and executed with scatter-gather operations.

Grouping and collaborative filtering may be provided to provide recommendations to the user. Search term based top few videos may also be maintained for the most common search terms which also determine the candidacy of videos available in the pool. Social engineering features to videos such as likes and comments can also be enabled with their own microservice and non-relational databases.

There are a few forms of storage that are helpful to storing videos. I mentioned OneFS here and left the comparision to public cloud. In this case, we could also evaluate Object Storage. The files need to be available over the internet anyways and Object storage provides a convenient location and decouples the programmability with storage in either public or private cloud.

#codingexercise

Find the number of decreasing paths in a matrix:

For example we have increasing paths for

1 2

1 3

as 1, 1, 2, 3, {1,2}, {1,3}, {2,3},{1,2,3} and the decreasing paths would just be the opposite.

We can do this with recursion:

int getCount(int[,] matrix, int [,] dp, int x, int y)

{

if (dp[x,y] != -1)

return dp[x,y];

int dx = new int []{ 0, 1, -1, 0};

int dy = new int [] {1 , 0 , 0 , -1};

int result = 1; // element by itself

for (int i = 0; i < dx.count; i++)

{

int m = x + dx;

int n = y + dy;

if (isValid(m,n, matrix) && matrix[m,n] < matrix[x,y]) {

result += getCount(matrix, dp, m, n);

}

dp[x, y] = result;

return result;

}

Cluster computing

Friday, July 20, 2018

No comments:

Post a Comment