Thursday, June 22, 2017

We were reviewing DynamoDB and Cosmos DB from our discussion on the design of online store. Then we started discussing the design of YouTube. We now continue with this discussion.
We listed the design considerations as follows:
We keep track of Users and Videos in our inventory. Videos may also have images associated such as thumbnails. It may not be surprising to find images far exceeding the number of videos. The videos could themselves number under a billion. Images and videos are best served from a file storage cluster such as Isilon OneFS cluster although performance may need to be compared with that of partitioned datacenter storage. 
User Model has information about users credentials as well as profile and statistics. These can be maintained in a relational database and can be served from the cloud with a managed service database. Users authorization may also flow via OAuth and therefore API services for user and video management may help. A typical User Interface, API services and Data stores will help with making the service available on a variety of devices. 
The Videos are large in number. They will be organized in pool and repository. The pool videos will be the most watched or trending and will be available from a web server that implements an LRU like policy. Pool can also be maintained per user and aged. Users watched videos can also be eagerly loaded into user's pool on users page load. Dynamic queries will be honored with immediate fetches from disk while statistics are updated to enable them to be served again from cache. The storage of videos presents significant challenges. There can be an alternative to letting them be managed by a managed OneFS NAS Storage. In this case, we place them in data center storage separated by some partioning mechanism of videos. Streaming from the data center storage will continue to happen with one or more web servers dedicated for such purpose. Streaming services can maintain their own cache and performance improvements.
CDN will be used wherever possible to improve geographically close network. It will serve static resources and alleviate the load on the web servers. Caching will be offered at every level and as much as possible so that the backend is not hit for all purposes.
Video replication and availability from multiple regions may be offered to improve their availability and reliability. Indexes may be maintained for all video metadata including authoring and uploading by owners.  Index need not be in relational data. Such information or mapping along with statistics can also be in key-value stores and executed with scatter-gather operations.
Grouping and collaborative filtering may be provided to provide recommendations to the user. Search term based top few videos may also be maintained for the most common search terms which also determine the candidacy of videos available in the pool. Social engineering features to videos such as likes and comments can also be enabled with their own microservice and non-relational databases.
#codingexercise
void GeneratePrimesUpto(int N)
{
var multiples = new List<int>(Enumerable.Repeat(0,N+1));
multiples[0] = 1;
multiples[1] = 1;
for (int i =2; i < N; i++)
   for (int k = 2; k<=N && i*k<=N; k++)
        if (multiples[i*k] == 0)
             multiples[i*k] = 1;
for (int k =1; k<=N; k++)
       if (multiples[k] == 0)
           Console.WriteLine("{0}", k);
}

We were discussing another problem yesterday:
Find the minimum length for a repeated brackets of number N
if x is the number of opening brackets and y is the number of closing brackets in the minimum length repeated sequence
then x*y =N or x*y+() = N
and we can increment x by 1 or y by 1
http://ideone.com/s0dxCX
This strategy does not take into consideration minimization based on consecutive x and y multiples with x == y
which means
        static int GetCountADP(int N, int m, int n)
        {
            if (N <= 0) return 0;
            if (N == 0) return 0;
            if (N == 1) return 2;
            if (m * n == N) return m + n;
            if (m * n > N) return N + N + N*2;
            int count1 = GetCountADP(N - n, m + 1, n) + 1;
            int count2 = GetCountADP(N - m, m, n + 1) + 1;
            int count3 = GetCountADP(N - 1, m, n) + 2;
            int count4 = GetCountADP(N - m*n, m+1, n+1) + m + n;
            var ret = new List<int>() { count1, count2, count3, count4 };
            int min = ret.Min();
            return min;

        }

No comments:

Post a Comment