Tuesday, February 7, 2017

The following is a detailed study of Microsoft Azure stack as inferred from an introduction of Azure by Microsoft 
Introduction: Organizations are increasingly leaning on public cloud or at least hybrid cloud to meet the demands on their IT. This writeup tries to cover nearly eighty topics of interest in the Azure Stack. 
Azure stack is a hybrid approach to cloud. On one hand, we have the Microsoft Azure public cloud and on the other hand, we have Microsoft Azure Stack either hosted or in private cloud. The Azure stack is therefore a One Azure ecosystem. It facilitates unified app development and the Azure services are in our datacenter. 
Both the Azure and Azure stack are tiered implementations of a cloud or pseudo cloud infrastructure at the bottom, an IaaS or PaaS layer next, followed by Azure Resource manager and Dev-ops tools on top. Microsoft Azure has developer tools for all platform services targeting web and mobile, Internet of Things, Microservices, and Data + analytics, Identity management, Media streaming, High Performance Compute and Cognitive services. These platform services all utilize core infrastructure of compute, networking, storage and security. 
The Azure resource manager has multiple resources, role based access control, custom tagging and self-service templates. 
Azure templates are like formula, We can use it for dedicated purposes.  For example, we can have a quick start template for Linux Virtual machine. There is a growing community of 350 unique templates, 300 unique contributors and over 4500 visitors each day. 
The compute services are made more agile with the offerings from a VM infrastructure, VM scale sets infrastructure, Container service orchestration and batch/Job orchestration. 
VM Scale sets are those that can auto-scale and have auto configuration at scale such as the ones using Chef and puppet.  
A variety of alphabet VM sizes are available.  Each VM has a resource usage in terms of a NIC card and a storage that is specific to that VM.   
VMs are available in sets as Racks with a power unit, network switch and a server.  These therefore determine fault domains.An availability set may be a group of control nodes or a group of databases. 
Compute requirement of a modern cloud app typically involve load balanced compute nodes that operate together with control nodes and databases. 
VM Scale sets provide scale, customization, availability, low cost and elasticity. 
VM scale sets in Azure resource manager generally have a type and a capacity. App deployment allow VM extension updates just like OS updates. 
Container infrastructure layering allows even more scale because it virtualizes the operating system. While traditional virtual machines enable hardware virtualization and hyper V’s allow isolation plus performance, containers are cheap and barely anything more than just applications. 
Azure container service serves both linux and windows container services. It has standard docker tooling and API support with streamlined provisioning of DCOS and Docker swarm. 
Azure is an open cloud because it supports open source infrastructure tools  such as  Linux, ubuntu, docker, etc. layered with databases and middleware such as hadoopredismysql etc., app framework and tools such as nodejs, java, python etc., applications such as Joomla, drupal etc and management applications such as chef, puppet, etc. and finally with devops tools such as jenkinsGradleXamarin etc. 
Job based computations use larger sets of resources such as with compute pools that involve automatic scaling and regional coverage with automatic recovery of failed tasks and input/output handling. 
Azure involves a lot of fine grained loosely coupled micro services using HTTP listener, Page content, authentication, usage analytic, order management, reporting, product inventory and customer databases. 
Microservices can be stateful or stateless and can be deployed in a multi-cloud manner.  
#codingexercise
Remove BST keys inside a given range

Node PruneBSTRange(Node root, int min, int max)
{
if (root == null) return null;
root.left = PruneBSTRange(root.left, min, max);
root.right = PruneBSTRange(root.right, min, max);
if (root.data > min)
{
var right = root.right;
delete root;
return right;
}
if (root.data < max)
{
var left = root.left;
delete root;
return left;
}

return root;
}
The comparision operator can also help with the sentinels.

Monday, February 6, 2017

Today we continue to compare Azure networking with AWS networking. We were discussing Security, Security Groups, Network ACLs, Custom routing tables and Virtual network interfaces. We also compared DNS service, connectivity, VPN over IPSec, private connectivity over Exchange, SDK & Tools.
Subnets can be created of any size. They can be single public or a mix of public and private. Traffic can be selectively permitted or denied using network access control lists (ACLs) Security can also be managed using security groups. Subnets follow the same routing as the overall network unless their routing table is customized. Each instance may have multiple ip address assigned, however, this requires one or more NIC cards and is usually permitted only on large instances by both cloud providers.
We can register domain names, route internet traffic to the resources for the domain and check the health of the resources using DNS services such as route 53 from AWS. Azure uses Anycast networking  so that each DNS query is answered by the closest available DNS server thus increasing the performance and the availability of the domain. Azure additionally provides CDN And traffic Manager.
CDN delivers content to end users through a robust network of global data centers. It cuts the time that it takes to serve up content to the web applications by caching closer to the user than the origin. Using the CDN, we can cache publically available objects loaded from Azure blob storage, web application, virtual machine, application folder and other HTTP/HTTPS location. The locations are regional and chosen to maximize the bandwidth to the clients.
Traffic manager routes incoming traffic for high performance and availability. Traffic manager distributes the user traffic for service endpoints in different datacenters using the Domain Name System.
A VPN gateway provides connectivity between the virtual network in the cloud and the on-premise site. It sends encyrpted traffic over a public connection. Azure provides a VPN Gateway and ExpressRoute gateway. A VPN gateway allows point to site as well as multisite that share bandwidth available to the gateway. All the VPN tunnels share the available bandwidth for the gateway. AWS provides Direct Connect links that lets you create virtual interfaces directly to the AWS cloud and Amazon Virtual Private cloud, bypassing the ISPs in the route. Both cloud provider provide programmable SDKs as well as CLI and REST APIs.

Network is assumed to be well-provisioned and its usage is assumed to be effectively free as long as bandwidth is available.  However these assumptions are not always true. For example, cluster applications are often deployed in cloud environments or even across multiple data center sites and cloud tenants would like to minimize their cost.  The authors of McCAT : multi-cloud cost-aware transport propose to control the network usage of cluster applications by creating a cost-aware transport service.  This service filters the data transmitted if it is ultimately not used by the application. It aggregates multiple data items into one to save bandwidth by reducing precision. and it multicasts data items to avoid redundant unicast transmissions of the same data across sites. With these three features, it aims to control and reduce network usage to remain in free tier of services.
#codingexercise
Remove BST keys outside a given range

Node PruneBST(Node root, int min, int max)
{
if (root == null) return null;
root.left = PruneBST(root.left, min, max);
root.right = PruneBST(root.right, min, max);
if (root.data < min)
{
var right = root.right;
delete root;
return right;
}
if (root.data > max)
{
var left = root.left;
delete root;
return left;
}
return root;

The same code above can be modified to find the outliers of the given range.

Sunday, February 5, 2017

Debate on database in a container versus database service 

Containers have been widely accepted as the new paradigm for software services and Platform as a service. To quote some trends from Datadog HQ: 
  1. Real Docker adoption is up 30% in one year 
  1. Docker now runs on 10% of the hosts we monitor 
  1. Larger companies are leading adoption 
  1. 2/3 of the companies that try Docker adopt it 
  1. Adopters 5x their container count within 9 months 
  1. The most widely used images are still registry, NGINX and Redis.  MySQL moved up from its position at #9. Postgre is the second most widely used open source database became the new #9. Running databases in containers is therefore popular. 
  1. Docker hosts often run five containers at a time. 
  1. VMs live 6x longer than Containers 

Container use for a database is suggested only along the following lines: 
  1. use of volume API for the persisted storage layer. The reason volumes are efficient is that they bypass the otherwise layered architecture that stack up to form the unified view of the image. 
  1. use of a specified directory on the host mounted into a specified location into the container. 
  1. use of a shared data volume container dedicated to a database 

As we can see from this methodology, this is rather limited in scope, size and scale of growth for a database and particularly for any production readiness or availability. The data storage/access is independent of whether the Mysql software runs in a Docker container competing with other applications or not.  Moreover the data should not be made local to the container file-system versus a volume because it will add overhead.  In the end, compute and storage requirements are different for applications and database. 

On the other hand, the same automation of failover, cluster based replication and availability can be achieved with database cluster, replica set, multi-DC failover, connection pooling etc. which comes with the topology for deployment of a database server. 

If it makes no difference to data access, then the database server can be moved to its own container with availability during container restarts. 

MySQL did a performance study on the container with regard to the above-mentioned three usages. They measured I/O and network overhead and compared the results to a stock instance of MySQL. Heavy I/O bound load gave rather even results There is neither I/O nor network overhead in this case.  Then the buffer pool size was scaled up and there was significant overhead from container usage as compared to the stock instance. 

There is a lot of inefficiency introduced with the container networking when the data does not reside local to the host, which is usually the case when the data grows larger than most storage on host virtual machines. First there is the overhead from the bridged network. Second there is the overhead for access of a remote volume over a mount point via the distributed file system. Compare this to the direct access of a database via a gateway-routed connection. Sure if the user can tolerate a minute of delay from service or container restarts and the usages are sporadic or shallow, then these don’t matter. For more involved access to a database, where the query execution in the order of milliseconds, matter disk access latencies are nothing in comparison to the monstrous network delay we introduce for relays of the data over the network.  Why then should we not push the database server closer to the data? 

Saturday, February 4, 2017

Today we continue to compare networking capabilities in the azure and aws public clouds. We compared Security, Security Groups, Network ACLs, Custom routing tables, Virtual network interfaces, DNS service, connectivity, VPN over IPSec, private connectivity over Exchange, SDK & Tools.We also reviewed CDN And traffic Manager from Azure. Let us look at VPN Gateway.
A VPN Gateway is a type of virtual network gateway that sends encrypted traffic across a public connection.  A VPN gateway can be of two types - ExpressRoute and vpn.  We are interested only in the latter category. A multisite connection establishes multiple connections to the same vpn gateway.
In such a setting, all the VPN tunnels share the available bandwidth for the gateway. Using the powershell and Azure portal, a vpn gateway settings can be configured.

#puzzle
There are three boxes of gold, silver and mixture. These boxes are all labeled incorrectly. How can we tell them apart if we are given only one choose to pick and open a box.
Solution. The box that is labeled mixed should be chosen. If it turns out to be silver, the box labeled silver will gold and the other will be mixed. If it is gold, the other two can similarly be found.  The mixture may contain silver or gold on the top layer, the label can still tell apart the similar looking content.

#codingexercise
Given k as number of floors and n as number of eggs, find the minimum number of trials needed to determine which floor is the highest and yet safe to drop an egg from.

int GetTrials(int n, int k)
{
if (k == 1 || k == 0) return k;

if ( n == 1) return k;
int min = INT_MAX;
for ( int i = 1; i <= k; i++)
{
    int result =  Math.max(GetTrials(n-1, k-1), GetTrials(n, k-i));
    if (result < min)
         min = res;
}
return min + 1;
}
Another way to look at it is to divide the floors by 2 for each egg drop. each subsequent egg in its lifetime divides and narrows down  the range further
// pseudocode to illustrate
assert(log(floors) <= eggs);
void GetTrials(int floors, int eggs, int int start, int end,  ref int trial)
{
if (start==end || eggs == 0) return;
int next = start+1;
for (int i = 0; 2^i <= end; i++)
     if (2^i > start) {
          next = 2^i;
          break;
     }
next = min(next, (start+end)/2);      
if (break(next))
{
end = next;
trials +=1;
eggs --;
GetTrials( floors, eggs, start, end, ref trials);
}else{
start = next;
trials += 1:
GetTrials(floors, eggs, start, end, ref trials);
}
}

Friday, February 3, 2017

Today we continue to compare networking capabilities in the azure and aws public clouds. We compared Security, Security Groups, Network ACLs, Custom routing tables, Virtual network interfaces, DNS service, connectivity, VPN over IPSec, private connectivity over Exchange, SDK & Tools. Now we review CDN And traffic Manager from Azure.
CDN delivers content to end users through a robust network of global data centers. It cuts the time that it takes to serve up content to the web applications by caching closer to the user than the origin. Using the CDN, we can cache publically available objects loaded from Azure blob storage, web application, virtual machine, application folder and other HTTP/HTTPS location.  The content is typically static such as images, stylesheets, documents, files, client-scripts and HTML pages and dynamic content such as PDF report and graph can also be served. CDN does not do this for each request and can even serve up entire websites from its cache. Generally video benefits immensely from the low latency.The locations are regional and chosen to maximize the bandwidth to the clients. It should be noted that the clients can be mobile and a host of data could come from the Internet of Things (IoT) but as data grows the CDNs cache more locally to the clients thus enabling better user experience.
Traffic manager routes incoming traffic for high performance and availability. Traffic manager distributes the user traffic for service endpoints in different datacenters. These endpoints can be Azure VMs, endpoints and cloud services. However, Traffic Manager can also be used with external non-Azure endpoints. Traffic Manager uses the Domain Name System to direct client requests to the most appropriate endpoint based on a traffic routing method and the health of the endpoints. Traffic Manager provides a range of traffic routing methods to suit different application needs, endpoint health monitoring and automatic failover.  Thus it can improve availability, responsiveness, allow maintenance without downtime, combine on-premise and cloud based applications and distribute traffic for large, complex deployments.


#codingexercise
We look at the tree construction method for finding the different interpretations for a sequence of digits.

Node GetTree(List<int>A, String b, int prev)
{
if (prev > 26)
     return null;
var newb = b + prev.ToLetter();
var root = new Node(){b};
if (A.Count != 0)
{
     prev = A[0];
     root.left = GetTree(prev, newb, A.GetRange(1, A.Count - 1));
     if (A.Count > 1) {
       prev = A[0]*10 + A[1];
       root.right = GetTree(prev, newb, A.GetRange(2, A.Count-2));
     }
}
return root;
}

Then we can print the leaves of this tree as ABA, AU, LA for {1,2,1}

Given a value N, if we want to make change for N cents, and we have infinite supply of each of S = { S1, S2, .. , Sm} valued coins, how many ways can we make the change? The order of coins doesn’t matter.
int GetCount(List<int> coins, int num, int sum)
{
if (sum == 0) return 1;
if (sum < 0) return 0;
if (num <= 0 && sum >= 1) return 0;
return GetCount(coins, num-1, sum) + GetCount(coins, num, sum-coins[num-1]);
}
we could also do the same with combinations where we take one or more of the same coins as long as the sum is not exceeded.

Thursday, February 2, 2017

Today we continue to compare networking capabilities in the azure and aws public clouds. We compared Security, Security Groups, Network ACLs, Custom routing tables, Virtual network interfaces. We now compare DNS service, connectivity, VPN over IPSec, private connectivity over Exchange, SDK & Tools,  AWS provides Route53 as a highly available and scalable DNS web service. We can register domain names, route internet traffic to the resources for the domain and check the health of the resources. The domain registry is with Verisign.  When we register a domain name, Amazon route 53 automatically creates a public hosted zone that has the same name as the domain and to route traffic to our resources, we create resource record sets. When an end user requests a name, the request is routed to a name resolver which forwards it to the DNS root name server. With the response the name resolver forwards it to the TLD name server for the .com domains. Again with the response, the name resolver forwards it to the route 53 name server which then provides the ip address for the name. The resolver then returns this to the user.
Azure uses Anycast networking  so that each DNS query is answered by the closest available DNS server thus increasing the performance and the availability of the domain.
VPN gateway provides connectivity between the virtual network in the cloud and the on-premise site. The Azure VPN gateway is a virtual network gateway that sends encyrpted traffic over a public connection. Azure provides two types of gateways - ExpressRoute and VPN. There can be only one gateway of each type in a virtual private network. A VPN gateway allows point to site as well as multisite that share bandwidth available to the gateway. AWS provides Direct Connect links that lets you create virtual interfaces directly to the AWS cloud and Amazon Virtual Private cloud, bypassing the ISPs in the route. Both cloud providers allow public ip addresses to be assigned to virtual machine instances there by providing internet connectivity. In order to service, large customers with heavy bandwidth requirements, both cloud providers seem to have agreements with major Telecom providers and ISVs to offer private connectivity between their clouds and the customer's premise. While AWS doesn't provide SLA for this service, Azure provides 99.9 % SLA.  Azure and AWS provide programmable SDKs as well as CLI and REST APIs. These significantly improve automation and workflow expansion.
#codingexercise
Find all possible interpretation of an array of digits.
For example 121 can be aba au la
We can decode a tree and print the leaves from the root for the interpretations. This is similar to dynamic programming as shown below:

void decode(List<int> A, int? prev, ref StringBuilder b)
{
if (A.empty()) { b+= ToLetter(prev); // null has no effect
                          Console.WriteLine(b.toString());
                          b -= ToLetter(prev);
                          return;}
if (prev !=  null) {
char c =  ToLetter(prev*10+A[0]); // data more than 26 has no effect
if (c.isValid()){
b+= c;
Decode( A.GetRange(1, A.Count-1), null, ref b);
b-=c;
}
}
char c =ToLetter(A[0]);
b+= c;
Decode(A.GetRange(1,A.Count -1), null, ref b);
b-=c;
Decode (A.GetRange(1,A.Count-1), A[0], ref b);
}

Wednesday, February 1, 2017

Yesterday we started comparing virtual private network capabilities in public clouds. Azure started offering virtual networks from 2013 while AWS has been an early starter.  Azure's Virtual Network VN resembles AWS VPC in many aspects but we will also enumerate differences. That said Azure has kicked it up a notch.

We compared subnets which is a critical component for dividing networks. Both AWS and Azure allows any size network to be created as either single public subnet or a mix of public and private subnets. The difference is in the use of wizards which makes it super easy.
As with all networks, security is important to the network which can be configured at the instance, subnet and the overall network level.  Moreover, we can selectively permit or deny traffic using network ACLs
The use of network ACLs is complimentary to the use of security groups which helps cotnrol access Traffic can also be managed at any level using custom routing tables. By default, the routing table of the network is used unless a custom is specified. 
Additional  ip addresses can be assigned to the same instance in a network. However in order to do this, the instance has to be large enough and support multiple NIC cards. Both providers allow multiple NIC cards upto 8 only on specific large instances. 

#codingexercise
Given an integer array of n integers, find sum of bit differences in all pairs that can be formed from array elements. We gave an alternate way that involved enumerating all combinations of pairs from the array using the Combine method discussed earlier
int GetSumBitDifferencesUsingCombinations(List<int> A)
{
int ret = 0;
var pairs = new List<Tuple<int,int>>();
var seq = new List<int>();
int start = 0;
int level = 0;
Combine(pairs, ref seq, ref pairs, start, level);
pairs.ForEach ( x => ret += GetDiff(x.first, x.second));
return ret;
}
Void Combine (List<int> a, ref List<int> b, ref List<Tuple<int, int>> pairs, int start, int level)
{
For (int I = start ; I < a.Length; i++)
{
if (b.contains(a[i]) == false){
b[level] = a[i];
if (b.length == 2) {
var t = new Tuple<int, int>(){b[0], b[1]};
pairs.Add(t);
}
If (I < a.Length-1)
Combine(a,b,start+1,level+1)
B[level] = 0;
}
}
int GetDiff(uint a, uint b)
{
int ret = 0;
for (int i = 0; i < 32; i++)
{
uint c = 1 << i;
if (a & c != b & c)
         ret++;
}

return ret;
}