Tuesday, June 6, 2017

We talked about the overall design of an online shopping store that can scale starting with our post here. Then we proceeded to discussing data infrastructure and data security in a system design case in previous posts. We started looking at a very specialized but increasingly popular analytics framework and Big Data to use with data sources. For example, Spark and Hadoop can be offered as a fully managed cloud offering. we continued looking at some more specialized infrastructure including dedicated private cloud. Then we added  serverless computing to the mix. Today we continue the discussion. This time we focus on Docker support.
OpenWhisk supports Docker actions. This means we can execute binaries on demand without provisioning virtual machines. Docker actions are best suited where it is difficult to refactor an application into smaller set of functions. This is a common use case for existing applications and services. 
When we request images from Docker for executing the action, these take longer because the latency is high. It depends on the size of the image and the network bandwidth. Contrast this with the pool of warm containers that don't require a cold start.  Moreover, Docker images may not be posted on a public hub because the code to execute on them may be proprietary and it will violate security. These were mitigated with OpenWhisk providing a base image for Docker actions. Also, a Docker action can now receive a zip file with an executable.
The suggestion here is that we dont need to create custom images. This saves time on latency. A base image is already provided. Also, the executable can be switched. Without customizing images and not sharing them, we don't compromise on security. In addition, since only the executables are switched, the time it takes to execute the code is less.

#codingexercise
A bot is an id that visits the site m times in the last n seconds. Given a list of entries in the log sorted by time, return all the bots id.
Yesterday we solved this with iteration over the relevant window of the log. This is a typical question on logs and events both of which are stored in Time Series Database.
Time series database helps with specialized queries for the data. Unlike a relational data that serves an OLTP system, the time series is a continuous stream of events and often at a high rate.
In the logs, Bots generally identify themselves with their user agent string and they obey the rules in the robots.txt file of the site. Consequently, we can differentiate the bots from the logs into those that behave and those who don't. And the ones that do leave an identification string.
      count = 0;
      string pat = @"(?<bot_name>Google?)bot\W";
      Regex r = new Regex(pat, RegexOptions.IgnoreCase);
      foreach (var kvp in h)
      {
           Match m = r.Match(h[kvp.key]);
           if (m.Success)
               count++;
      }
one more:
count the number of ways elements add upto N using array elements with repetitions allowed:
int GetCount(List<int> A, int sum)
{
var counts = new int[A.Count + 1] {0};
for ( int i = 0; i < A.Count; i++)
    counts[i] = 0;
counts[0] = 1;
for (int i = 1; i <= sum; i++)
    for (int j = 0; j < A.Count; j++)
        if (i >= A[j])
              counts[i] += counts[i-A[j]];
return counts[sum];
}
Alternatively, this can be done with backtracking instead of dynamic programming as we showed with the help of the Combine method involving repetitions in the earlier posts.

No comments:

Post a Comment