Sunday, February 5, 2017

Debate on database in a container versus database service 

Containers have been widely accepted as the new paradigm for software services and Platform as a service. To quote some trends from Datadog HQ: 
  1. Real Docker adoption is up 30% in one year 
  1. Docker now runs on 10% of the hosts we monitor 
  1. Larger companies are leading adoption 
  1. 2/3 of the companies that try Docker adopt it 
  1. Adopters 5x their container count within 9 months 
  1. The most widely used images are still registry, NGINX and Redis.  MySQL moved up from its position at #9. Postgre is the second most widely used open source database became the new #9. Running databases in containers is therefore popular. 
  1. Docker hosts often run five containers at a time. 
  1. VMs live 6x longer than Containers 

Container use for a database is suggested only along the following lines: 
  1. use of volume API for the persisted storage layer. The reason volumes are efficient is that they bypass the otherwise layered architecture that stack up to form the unified view of the image. 
  1. use of a specified directory on the host mounted into a specified location into the container. 
  1. use of a shared data volume container dedicated to a database 

As we can see from this methodology, this is rather limited in scope, size and scale of growth for a database and particularly for any production readiness or availability. The data storage/access is independent of whether the Mysql software runs in a Docker container competing with other applications or not.  Moreover the data should not be made local to the container file-system versus a volume because it will add overhead.  In the end, compute and storage requirements are different for applications and database. 

On the other hand, the same automation of failover, cluster based replication and availability can be achieved with database cluster, replica set, multi-DC failover, connection pooling etc. which comes with the topology for deployment of a database server. 

If it makes no difference to data access, then the database server can be moved to its own container with availability during container restarts. 

MySQL did a performance study on the container with regard to the above-mentioned three usages. They measured I/O and network overhead and compared the results to a stock instance of MySQL. Heavy I/O bound load gave rather even results There is neither I/O nor network overhead in this case.  Then the buffer pool size was scaled up and there was significant overhead from container usage as compared to the stock instance. 

There is a lot of inefficiency introduced with the container networking when the data does not reside local to the host, which is usually the case when the data grows larger than most storage on host virtual machines. First there is the overhead from the bridged network. Second there is the overhead for access of a remote volume over a mount point via the distributed file system. Compare this to the direct access of a database via a gateway-routed connection. Sure if the user can tolerate a minute of delay from service or container restarts and the usages are sporadic or shallow, then these don’t matter. For more involved access to a database, where the query execution in the order of milliseconds, matter disk access latencies are nothing in comparison to the monstrous network delay we introduce for relays of the data over the network.  Why then should we not push the database server closer to the data? 

Saturday, February 4, 2017

Today we continue to compare networking capabilities in the azure and aws public clouds. We compared Security, Security Groups, Network ACLs, Custom routing tables, Virtual network interfaces, DNS service, connectivity, VPN over IPSec, private connectivity over Exchange, SDK & Tools.We also reviewed CDN And traffic Manager from Azure. Let us look at VPN Gateway.
A VPN Gateway is a type of virtual network gateway that sends encrypted traffic across a public connection.  A VPN gateway can be of two types - ExpressRoute and vpn.  We are interested only in the latter category. A multisite connection establishes multiple connections to the same vpn gateway.
In such a setting, all the VPN tunnels share the available bandwidth for the gateway. Using the powershell and Azure portal, a vpn gateway settings can be configured.

#puzzle
There are three boxes of gold, silver and mixture. These boxes are all labeled incorrectly. How can we tell them apart if we are given only one choose to pick and open a box.
Solution. The box that is labeled mixed should be chosen. If it turns out to be silver, the box labeled silver will gold and the other will be mixed. If it is gold, the other two can similarly be found.  The mixture may contain silver or gold on the top layer, the label can still tell apart the similar looking content.

#codingexercise
Given k as number of floors and n as number of eggs, find the minimum number of trials needed to determine which floor is the highest and yet safe to drop an egg from.

int GetTrials(int n, int k)
{
if (k == 1 || k == 0) return k;

if ( n == 1) return k;
int min = INT_MAX;
for ( int i = 1; i <= k; i++)
{
    int result =  Math.max(GetTrials(n-1, k-1), GetTrials(n, k-i));
    if (result < min)
         min = res;
}
return min + 1;
}
Another way to look at it is to divide the floors by 2 for each egg drop. each subsequent egg in its lifetime divides and narrows down  the range further
// pseudocode to illustrate
assert(log(floors) <= eggs);
void GetTrials(int floors, int eggs, int int start, int end,  ref int trial)
{
if (start==end || eggs == 0) return;
int next = start+1;
for (int i = 0; 2^i <= end; i++)
     if (2^i > start) {
          next = 2^i;
          break;
     }
next = min(next, (start+end)/2);      
if (break(next))
{
end = next;
trials +=1;
eggs --;
GetTrials( floors, eggs, start, end, ref trials);
}else{
start = next;
trials += 1:
GetTrials(floors, eggs, start, end, ref trials);
}
}

Friday, February 3, 2017

Today we continue to compare networking capabilities in the azure and aws public clouds. We compared Security, Security Groups, Network ACLs, Custom routing tables, Virtual network interfaces, DNS service, connectivity, VPN over IPSec, private connectivity over Exchange, SDK & Tools. Now we review CDN And traffic Manager from Azure.
CDN delivers content to end users through a robust network of global data centers. It cuts the time that it takes to serve up content to the web applications by caching closer to the user than the origin. Using the CDN, we can cache publically available objects loaded from Azure blob storage, web application, virtual machine, application folder and other HTTP/HTTPS location.  The content is typically static such as images, stylesheets, documents, files, client-scripts and HTML pages and dynamic content such as PDF report and graph can also be served. CDN does not do this for each request and can even serve up entire websites from its cache. Generally video benefits immensely from the low latency.The locations are regional and chosen to maximize the bandwidth to the clients. It should be noted that the clients can be mobile and a host of data could come from the Internet of Things (IoT) but as data grows the CDNs cache more locally to the clients thus enabling better user experience.
Traffic manager routes incoming traffic for high performance and availability. Traffic manager distributes the user traffic for service endpoints in different datacenters. These endpoints can be Azure VMs, endpoints and cloud services. However, Traffic Manager can also be used with external non-Azure endpoints. Traffic Manager uses the Domain Name System to direct client requests to the most appropriate endpoint based on a traffic routing method and the health of the endpoints. Traffic Manager provides a range of traffic routing methods to suit different application needs, endpoint health monitoring and automatic failover.  Thus it can improve availability, responsiveness, allow maintenance without downtime, combine on-premise and cloud based applications and distribute traffic for large, complex deployments.


#codingexercise
We look at the tree construction method for finding the different interpretations for a sequence of digits.

Node GetTree(List<int>A, String b, int prev)
{
if (prev > 26)
     return null;
var newb = b + prev.ToLetter();
var root = new Node(){b};
if (A.Count != 0)
{
     prev = A[0];
     root.left = GetTree(prev, newb, A.GetRange(1, A.Count - 1));
     if (A.Count > 1) {
       prev = A[0]*10 + A[1];
       root.right = GetTree(prev, newb, A.GetRange(2, A.Count-2));
     }
}
return root;
}

Then we can print the leaves of this tree as ABA, AU, LA for {1,2,1}

Given a value N, if we want to make change for N cents, and we have infinite supply of each of S = { S1, S2, .. , Sm} valued coins, how many ways can we make the change? The order of coins doesn’t matter.
int GetCount(List<int> coins, int num, int sum)
{
if (sum == 0) return 1;
if (sum < 0) return 0;
if (num <= 0 && sum >= 1) return 0;
return GetCount(coins, num-1, sum) + GetCount(coins, num, sum-coins[num-1]);
}
we could also do the same with combinations where we take one or more of the same coins as long as the sum is not exceeded.

Thursday, February 2, 2017

Today we continue to compare networking capabilities in the azure and aws public clouds. We compared Security, Security Groups, Network ACLs, Custom routing tables, Virtual network interfaces. We now compare DNS service, connectivity, VPN over IPSec, private connectivity over Exchange, SDK & Tools,  AWS provides Route53 as a highly available and scalable DNS web service. We can register domain names, route internet traffic to the resources for the domain and check the health of the resources. The domain registry is with Verisign.  When we register a domain name, Amazon route 53 automatically creates a public hosted zone that has the same name as the domain and to route traffic to our resources, we create resource record sets. When an end user requests a name, the request is routed to a name resolver which forwards it to the DNS root name server. With the response the name resolver forwards it to the TLD name server for the .com domains. Again with the response, the name resolver forwards it to the route 53 name server which then provides the ip address for the name. The resolver then returns this to the user.
Azure uses Anycast networking  so that each DNS query is answered by the closest available DNS server thus increasing the performance and the availability of the domain.
VPN gateway provides connectivity between the virtual network in the cloud and the on-premise site. The Azure VPN gateway is a virtual network gateway that sends encyrpted traffic over a public connection. Azure provides two types of gateways - ExpressRoute and VPN. There can be only one gateway of each type in a virtual private network. A VPN gateway allows point to site as well as multisite that share bandwidth available to the gateway. AWS provides Direct Connect links that lets you create virtual interfaces directly to the AWS cloud and Amazon Virtual Private cloud, bypassing the ISPs in the route. Both cloud providers allow public ip addresses to be assigned to virtual machine instances there by providing internet connectivity. In order to service, large customers with heavy bandwidth requirements, both cloud providers seem to have agreements with major Telecom providers and ISVs to offer private connectivity between their clouds and the customer's premise. While AWS doesn't provide SLA for this service, Azure provides 99.9 % SLA.  Azure and AWS provide programmable SDKs as well as CLI and REST APIs. These significantly improve automation and workflow expansion.
#codingexercise
Find all possible interpretation of an array of digits.
For example 121 can be aba au la
We can decode a tree and print the leaves from the root for the interpretations. This is similar to dynamic programming as shown below:

void decode(List<int> A, int? prev, ref StringBuilder b)
{
if (A.empty()) { b+= ToLetter(prev); // null has no effect
                          Console.WriteLine(b.toString());
                          b -= ToLetter(prev);
                          return;}
if (prev !=  null) {
char c =  ToLetter(prev*10+A[0]); // data more than 26 has no effect
if (c.isValid()){
b+= c;
Decode( A.GetRange(1, A.Count-1), null, ref b);
b-=c;
}
}
char c =ToLetter(A[0]);
b+= c;
Decode(A.GetRange(1,A.Count -1), null, ref b);
b-=c;
Decode (A.GetRange(1,A.Count-1), A[0], ref b);
}

Wednesday, February 1, 2017

Yesterday we started comparing virtual private network capabilities in public clouds. Azure started offering virtual networks from 2013 while AWS has been an early starter.  Azure's Virtual Network VN resembles AWS VPC in many aspects but we will also enumerate differences. That said Azure has kicked it up a notch.

We compared subnets which is a critical component for dividing networks. Both AWS and Azure allows any size network to be created as either single public subnet or a mix of public and private subnets. The difference is in the use of wizards which makes it super easy.
As with all networks, security is important to the network which can be configured at the instance, subnet and the overall network level.  Moreover, we can selectively permit or deny traffic using network ACLs
The use of network ACLs is complimentary to the use of security groups which helps cotnrol access Traffic can also be managed at any level using custom routing tables. By default, the routing table of the network is used unless a custom is specified. 
Additional  ip addresses can be assigned to the same instance in a network. However in order to do this, the instance has to be large enough and support multiple NIC cards. Both providers allow multiple NIC cards upto 8 only on specific large instances. 

#codingexercise
Given an integer array of n integers, find sum of bit differences in all pairs that can be formed from array elements. We gave an alternate way that involved enumerating all combinations of pairs from the array using the Combine method discussed earlier
int GetSumBitDifferencesUsingCombinations(List<int> A)
{
int ret = 0;
var pairs = new List<Tuple<int,int>>();
var seq = new List<int>();
int start = 0;
int level = 0;
Combine(pairs, ref seq, ref pairs, start, level);
pairs.ForEach ( x => ret += GetDiff(x.first, x.second));
return ret;
}
Void Combine (List<int> a, ref List<int> b, ref List<Tuple<int, int>> pairs, int start, int level)
{
For (int I = start ; I < a.Length; i++)
{
if (b.contains(a[i]) == false){
b[level] = a[i];
if (b.length == 2) {
var t = new Tuple<int, int>(){b[0], b[1]};
pairs.Add(t);
}
If (I < a.Length-1)
Combine(a,b,start+1,level+1)
B[level] = 0;
}
}
int GetDiff(uint a, uint b)
{
int ret = 0;
for (int i = 0; i < 32; i++)
{
uint c = 1 << i;
if (a & c != b & c)
         ret++;
}

return ret;
}

Tuesday, January 31, 2017

In the previous posts, we talked about public cloud offerings. This leads us to compare public clouds. Let us do so today with comparision of private network offerings.

Azure started offering virtual networks from 2013 while AWS has been an early starter.  Azure's Virtual Network VN resembles AWS VPC in many aspects but we will also enumerate differences. That said Azure has kicked it up a notch.

Subnets:
Subnets divide the network into smaller network and are very helpful to create private networks. AWS allows VPC to be created with
1) single public subnet
2) public and private subnets
3) public and private subnets with hardware VPN access
4) private subnet only with a hardware VPN access.
Azure also gives similar flexibility to create subnets of any size. The difference is in the use of wizards between AWS and Azure.  AWS wizards have been used for a while longer.

Security:
Security is one of the most important considerations for the virtual private network. AWS provides security services to tackle at instance level, subnet level and overall network.  Azure has simpler design. Inbound and outbound rules can be configured in both.

Security Groups:
AWS security groups helps control access and this is available from both. Azure calls it Network Security Group and it is available only for regional virtual networks. 

Network ACLs:
Azure and AWS both support network ACLs which allow users to selectively permit or deny traffic to your networks.

Custom routing tables.
subnets inherit the main routing table. In AWS, we can customize these within a subnet.In Azure, the support was somewhat limited until recently.

Virtual network interfaces
multiple NICs are allowed to be activated on virtual machine instances however both cloud providers only allow this on large instances.
documentation on Azure: https://docs.microsoft.com/en-us/azure/
documentation on AWS: https://aws.amazon.com/documentation/ 

#codingexercise
int GetMinSumSquares(List<int>positives, uint scalar)
{
if (positives == null || positives.Count == 0) return 0;
int inclusive = positives[0];
int exclusive = 0;
for (int i = 1; i < positives.Count; i++)
{
var min = Math.min(inclusive, exclusive);
inclusive  = exclusive + positives[i] × positives[i];
exclusive = min;
}
return Math.min(inclusive, exclusive);
}
Given an integer array of n integers, find sum of bit differences in all pairs that can be formed from array elements
For example bit difference for 2 represented by 010 and 7 represented by 111 is 2. Therefore an array with {2,7} will have (2,2), (2,7), (7,2) and (7,7) with sum of bit differences = 0 + 2 + 2 + 0 = 4.
int GetSumBitDifferences(List<int> A)
{
int ret = 0;
for (int i = 0; i < 32; i++)
{
int count = 0;
for (int j = 0; j < A.Count; j++)
     if (A[j] & (1 << i))
         count++;

ret += (count x (A.Count-count) x 2);
}
return ret;
}
Another way to do this would be to enumerate all combinations of pairs from the array using the Combine method discussed earlier
int GetSumBitDifferencesUsingCombinations(List<int> A)
{
int ret = 0;
var pairs = new List<Tuple<int,int>>();
var seq = new List<int>();
int start = 0;
int level = 0;
Combine(pairs, ref seq, ref pairs, start, level);
pairs.ForEach ( x => ret += GetDiff(x.first, x.second));
return ret;
}
The combinations above also include those where the first and second element are the same however there is no contribution to the result

Monday, January 30, 2017


An introduction to private cloud versus public cloud masquerading as private cloud can be found here: https://1drv.ms/w/s!Ashlm-Nw-wnWrF0qVoeSgL7Xnu1w 
Here are a few specific ways to differentiate and make the private cloud more appealing while not stealing any light from public cloud. These include
1)     Provide container resources in addition to virtual machines to explode the number of computing resource
2)     Provide services that are customized to frequent usages by private cloud customers. This includes not only making it easier to use some services but also provisioning those that many customers often use.
3)     Anticipate customer requests and suggest compute resources based on past history and measurements.
4)     Provide additional services that more customers are drawn to the services and not just to the cloud. Additionally, the customers won’t mind when the weight of the services is shifted between public and private cloud infrastructure as the costs dictate.
5)     Provide additional services that won’t be offered elsewhere. For example, data tiering, aging, archival, deduplication, file services, backup and restore, naming and labeling, accelerated networks etc. offer major differentiation that do not necessarily have to lean towards machine learning to make the private cloud smart.
6)     Offer major periodic maintenance and activities on behalf of the customer such as monitoring disk space and adding storage, checking for usages and making in place suggestions on the portal.
7)     Reduce the volume of service desk tickets aggressively with preemptive actions and minimizing them to only failures.  This is paying off debt so it may not translate to new services.
8)     Improving regional experiences not only with savvy resources but also improved networks for major regions
9)     Provide transparency, accounting and auditing so that users can always choose to get more information for self-help and troubleshooting. FAQs and documentations could be improved preferably with search field.
10)  Enable subscriptions to any or all alerts that can be setup by the customer on various activities. This gives the user informational emails with subjects that can be used to filter and treat at appropriate levels.
 #codingexercise
int GetMinSumWeightedNegatives(List<int>negatives, uint scalar)
{
if (negatives == null || negatives.Count == 0) return 0;
int inclusive = negatives[0];
int exclusive = 0;
for (int i = 1; i < negatives.Count; i++)
{
var min = Math.min(inclusive, exclusive);
inclusive  = exclusive + scalar × negatives[i];
exclusive = min;
}
return Math.min(inclusive, exclusive);
}

It may be interesting to note that scalar weight cannot be negative unless the count of included negatives is even.