Wednesday, February 15, 2017

We continue with a detailed study of Microsoft Azure stack as inferred from an introduction of Azure by Microsoft. We reviewed some more features of Azure networking. 
The global DNS name resolution is pretty fast with very high availability. It is integrated with the Azure Resource Manager for role based access control, tagging and template based deployment - for both zones and record sets. 
Azure virtual machines support native dual stacks : (IPV4+IPV6) on both flavors of the operating system. It is available globally. It maximizes the reach of Azure applications using mobile (4G) and IoT devices
Azure load balancer services come at different levels. There is the traffic manager for cross region direction and availability and exposure to the internet.  There is the Azure Load Balancer which provides in-region scalability and availability. There is the Azure application gateway which has URL/content based routing and load balancing. Then there is load balancing on the VMs for web servers.
Load Balancers in Azure use multiple VIPs to simplify designs and reduce cost. There are multiple private VIPs on a load balancer and the backend ports are reused using direct server return (DSR). Secondary NICs associated are also provided to enable connectivity to restricted vnet. Load balancer can now be setup with internal and external VIP with direct public association. Moreover, a NIC can now have multiple private IPs - static or dynamic and multiple public IPs - static or dynamic and unlocks NVA partners.  Application Gateway has layer 7 application delivery controller features. It can enforce security with SSL termination and allow/block SSL protocol versions. It can manage session and site with cookie based session affinity and muti-site hosting. It can manage content with URL based routing. It can manage backend with rich diagnostics including access and performance logs, VM scale set support and custom health probes.
The Web application firewall has also been significantly improved based on Open Web Application Security Project (owasp.org). WAF security protects applications from web based intrusions and is built using ModSecurity and CoreRule set. It is highly available and fully managed. It is preconfigured for most common web vulnerabilities such as SQL injection and XSS attacks.
Cross premises connectivity is maintained with P2S SSTP tunnels and IPSEC S2S VPN tunnels to Azure over Internet. If there is a private WAN, an ExpressRoute is maintained to Azure.
Print the largest BST in a given binary tree
int GetMaxBST(Node root)
{
if (IsBST(root))
    return treesize(root);
return Math.max( GetMaxBST(root.left), GetMaxBST(root.right));
}

bool isBST(node root)
{
return IsBstHelper(root, INT_MIN, INT_MAX);
}

bool IsBstHelper(node root, int min, int max)
{
 if (root==null) return true;
 if (root.data < min || root.data> max) return false;
return IsBstHelper(root.left, min, root.data-1) &&
           IsBstHelper(root.right, root.data+1, max);

}
int treesize(Node root)
{
if (root == null) return 0;
return treesize(root.left) + treesize(root.right) + 1;
}

we could also combine the operations above to make the traversals more efficient.
int GetMaxBST(node root, int min, int max)
{
if (root == null) return 0;
if (root.data < min || root.data > max) return 0;
int left = GetMaxBST(root.left, min, root.data - 1);
int right = GetMaxBST(root.right, root.data+1, max); 
if (left >= 0 && right >= 0)
    return  left + right + 1;
return 1; 

Tuesday, February 14, 2017

We continue with a detailed study of Microsoft Azure stack as inferred from an introduction of Azure by Microsoft. We discussed Azure stack is a hybrid approach to cloud. Microsoft Azure has developer tools for all platform services targeting web and mobile, Internet of Things, Microservices, and Data + analytics, Identity management, Media streaming, High Performance Compute and Cognitive services. These platform services all utilize core infrastructure of compute, networking, storage and security. The Azure resource manager has multiple resources, role based access control, custom tagging and self-service templates.
1)  The compute services are made more agile with the offerings from a VM infrastructure, VM scale sets infrastructure, Container service orchestration and batch/Job orchestration. 
2) The Paas platform of Azure can span Azure and AWS both. It can occupy on-Premise, GCP and others as well Containers, serverless and Microservices are different forms of computing. 
3) Azure provides a data platform. The Paas platform of Azure can span Azure and AWS both. It can occupy on-Premise, GCP and others as well Containers, serverless and Microservices are different forms of computing. The consumers for this data transformation to actions are people as well as apps and automated systems.
4) Azure Networking is divided into regions that include inside the Azure region, connecting Azure regions, and geographic reach and internet ecosystems. The networking inside the Azure region already comes with security, performance, load balancing, virtual networks, cross-premises connectivity. Azure comes with accelerated networking
We review some more features of Azure networking. Azure DNS is a globally distributed architecture which is resilient to multiple region failure. The global DNS name resolution is pretty fast with very high availability. It is integrated with the Azure Resource Manager for role based access control, tagging and template based deployment - for both zones and record sets. Azure DNS comes with REST API and SDKs for application integration.
Azure virtual machines support native dual stacks : (IPV4+IPV6) on both flavors of the operating system. It is available globally. It maximizes the reach of Azure applications using mobile (4G) and IoT devices
IPv6 is required by governments and their suppliers. IPV6 has AAAA record.
Azure load balancer services come at different levels. There is the traffic manager for cross region direction and availability and exposure to the internet.  There is the Azure Load Balancer which provides in-region scalability and availability. There is the Azure application gateway which has URL/content based routing and load balancing. Then there is load balancing on the VMs for web servers.
Load Balancers in Azure use multiple VIPs to simplify designs and reduce cost. There are multiple private VIPs on a load balancer and the backend ports are reused using direct server return (DSR) which comes in very useful when there is concern that the load balancer will become a bottleneck. In such cases, the servers are allowed to respond to the client directly such as when the requests are small but the responses are large.

#codingexercise
Find the closest element to a given value in a binary search tree

void GetMinDiff(Node root, int k, ref int diff, ref int key)
{
if (root == null) return;
if (root.data == k){
key == k;
return;
}
if (diff > Math.abs(root.data-k))
{
   diff = Math.abs(root.data -k);
   key = k;
}
if (k <root.data)
     diff = GetMinDiff(root.left, k, ref diff, ref key);
else
     diff = GetMinDiff(root.right, k, ref diff, ref key);
}
The same works for the furthest element from a given value in the same binary search tree with a modification to the comparision of the diff.
In this case, we are trying to maxinize the difference.


Monday, February 13, 2017

Title: Improvements in stream processing of big data 
Introduction: In my discussion of online versus batch mode of processing for big data as shared here (http://1drv.ms/1OM29ee), I discussed that the online mode is made possible because of the summation form, however I took the liberty of assuming that the merge operation  of the summation form is linear after each node computes its summation. This could be improved further when the summaries are maintained as Fibonacci heaps because this data structure offers the following advantages: 
  1. The merge operations is less than linear if not constant time and the potential does not change 
  1. The insert operation takes constant amortized time, rather than logarithmic time and  
  1. The decrease key operation also takes constant amortized time rather than the logarithmic time. 
Moreover, the Fibonacci heaps have the property that the size of a subtree rooted in a node of degree k is at least the (k+2)th Fibonacci number. This lets us make approximations much faster by making approximations on each node. The distribution of the Fibonacci heap also lets us propagate these approximations between heap additions and deletions especially when the potential does not change. 
In addition to the data structure, Fibonacci series also plays an interesting role in the algorithm of the online processing with its nature of utilizing the next computation based on the last two steps only. In other words, we don’t repeat the operations of the online processing in every iteration. We skip some and re-evaluate or adjust the online predictions at the end of the iteration corresponding to Fibonacci numbers. We use Fibonacci numbers versus any other series such as binomial or exponential or powers of two because intuitively we are readjusting our approximations in our online processing and Fibonacci gives us a way to adjust them based on previous and current approximations.  
Straightforward aggregations using summation forms show a property that the online prediction is improving the prediction from the past in a straight line and uses only the previous approximation in the current prediction. However, if we were to use the predictions from the iterations corresponding to the Fibonacci series, then we refine the online iterations to not just a linear extrapolation but also Fibonacci number based smoothing. 
The Fibonacci series is better than exponential because it has better behavior near the asymptote.

Sunday, February 12, 2017

A private cloud provides resources such as compute, storage and networks to customers. These resources could be monitored for resource utilization, application performance, and operational health. The notion is borrowed from a public cloud such as AWS where Amazon CloudWatch is a monitoring service for AWS cloud resources and the applications that run on AWS. It provides system wide visibility and helps us keep the application running which can help customers keep the operations running smoothly.  They can also use this service to set up alerts and notifications that are of interest to them about their resources. 
This kind of resource monitoring is different from Cloud Service monitoring because the former is useful for customers while the latter is useful for the cloud provider. The latter is used mostly for the health of the services which are offered to more than one customer. The former can be customer specific depending on the registrations or resources and subscriptions for events.  
The implementation is also very different between the two. For example, the cloud services often report metrics directly to a metrics database from which health check reports are drawn. Usually the data is often aged and stored in a time series database for cumulative charts. The transformation of data from collection to a time series database for reporting is usually paid by the query requesting the charts. 
CloudWatch on the other hand is an event collection framework. In real time, events can be collected, archived, filtered and used for subsequent analysis. The collection of data again flows into a database from which queries can be meaning fully read and sent out. The events are much like the messages in a message broker except that it is tuned for high performance and cloud scale. The events have several attributes and are often extensible as name value pairs by the event producers who register different types of event formats. The actual collection of each event, its firing and its subsequent handling is all done within the event framework. This kind of framework has to rely on small packet sizes for the events and a robust and fast message broker functionality within the event broker. Handlers can be registered for each type of events or queues on which the events are stored.  
Thus while the former is based on a time series database for metrics, the latter is based on an event collection and handling engine also referred to as an event driven framework. 
#codingexercise
Replace every element with the least greater element to the right. For example, 
Input: [8, 58, 71, 18, 31, 32, 63, 92, 
         43, 3, 91, 93, 25, 80, 28]
Output: [18, 63, 80, 25, 32, 43, 80, 93, 

         80, 25, 93, -1, 28, -1, -1]
void Replace(ref List<int>A)
{
for (int i = 0; i < A.Count; i++)
{
  int min = INT_MAX;
   for (int j = i+1; j < A.Count; j++)
          if (A[j] < min && A[j] > A[i])
              min = A[j];
  if (min == INT_MAX)
               min = -1;
   A[i] = min;
}
}

Saturday, February 11, 2017

A tutorial on asynchronous programming for DevOps:
Introduction – DevOps engineers increasingly rely on writing services for automation and integrating new functionality on underlying systems. These services that involve chaining of one or more operations often incur delay that exceeds user tolerance on a web page. Consequently, they are faced with the challenge of giving an early response to the user even when the resource requested by the user may not be available.  The following some of the techniques commonly employed by these engineers:
1) Background tasks – Using a database and a transactional behavior, the engineers used to chain one or more actions within the same transaction scope. However, when each action takes a long time to complete, the chained actions amount to a significant delay. Although semantically correct, this does not lend itself to reasonable delay. Consequently, this is split into a foreground and a background task where a database entry implies a promise that will be fulfilled subsequently by a background task or invalidated on failures. Django-background-tasks is an example of this functionality and it involves merely decorating a method to register as a background task. Additionally, these registrations can specify the schedule in terms of time period as well as the queues on which these tasks are stored. Internally they are implemented with their own background tasks table that allows retries and error handling.
2) Async tasks – Using task parallel library such as django-utils, these allow tasks to merely be executed on a separate thread in the runtime without the additional onus of persistence and formal handling of tasks. While the main thread can service the user and be responsive, the async registered method attempts to parallelize execution without guarantees.
3) Threads and locks – Using concurrent programming, developers such as those using python concurrent.futures look for ways to partition the tasks or stitch execution threads based on resource locks or time sharing. This works well to reduce the overall task execution time by identifying isolated versus shared or dependent actions. The status on the objects indicates their state. Typically this state is progressive so that errors are minimized and users can be informed of intermediate status and the progress made. A notion of optimistic concurrency control goes a step further by not requiring locks on those resources.
4) Message Broker – Using queues to store jobs, this mechanism allows actions to be completed later often by workers independent of the original system that queued the task. This is very helpful to separate queued based on the processors handling the queues and for scaling out to as many workers as necessary to keep the jobs moving on the queue. Additionally, these message brokers come with options to maintain transactional behavior, hold dead letter queues, handle retries and journal the messages. Message brokers can also scale to beyond one server to handle any volume of traffic
5) Django celery – Using this library, the onerous tasks associated with a message broker are internalized and the developers are given very nifty and clean interface to perform their tasks in an asynchronous manner. This is a simple, flexible, reliable and distributed task queue that can process vast amounts of messages with a focus on real-time processing and support for task scheduling. While previously it was available in its own library, it has now become standard with the django framework.
Conclusion – Most application developers choose one or more of the strategies above depending on the service level agreements, the kind of actions, the number of actions on each request, the number of requests, the size and scale of demand and other factors. There is a tradeoff between complexity and the layering or organization of tasks beyond the synchronous programming.

Friday, February 10, 2017

We continue with a detailed study of Microsoft Azure stack as inferred from an introduction of Azure by Microsoft. We discussed Azure stack is a hybrid approach to cloud. Microsoft Azure has developer tools for all platform services targeting web and mobile, Internet of Things, Microservices, and Data + analytics, Identity management, Media streaming, High Performance Compute and Cognitive services. These platform services all utilize core infrastructure of compute, networking, storage and security. The Azure resource manager has multiple resources, role based access control, custom tagging and self-service templates.
1)  The compute services are made more agile with the offerings from a VM infrastructure, VM scale sets infrastructure, Container service orchestration and batch/Job orchestration. 
2) The Paas platform of Azure can span Azure and AWS both. It can occupy on-Premise, GCP and others as well Containers, serverless and Microservices are different forms of computing. 
3) Azure provides a data platform. The Paas platform of Azure can span Azure and AWS both. It can occupy on-Premise, GCP and others as well Containers, serverless and Microservices are different forms of computing. The consumers for this data transformation to actions are people as well as apps and automated systems.
4) Azure Networking is divided into regions that include inside the Azure region, connecting Azure regions, and geographic reach and internet ecosystems. The networking inside the Azure region already comes with security, performance, load balancing, virtual networks, cross-premises connectivity. Azure comes with accelerated networking
Azure allows vnets which help us define our network and our policies. it comes loaded with features such as dmz, backend subnets, customizable routes, etc discussed earlier. vnet to vnet traffic is via gateway. There is a full mesh network with vnets. The VNets can thus be peered.It can be setup easily and latency and throughput are same as in single peer
#codingexercise
Find if a given sorted subsequence exists in a binary search tree.
bool HasSequence(Node root, List<int> A)
{
int index = 0;
inOrderTraverse(root, A, ref index); // traverse in increasing order of elements
return index == A.Count;
}
void inOrderTraverse(Node root, List<int> A, ref int index)
{
if (root == NULL) return;
inOrderTraverse(root.left, A, ref index);
if (root.data == A[index])
    index++;
inOrderTraverse(root.right, A, ref index);
}

Thursday, February 9, 2017

We continue with a detailed study of Microsoft Azure stack as inferred from an introduction of Azure by Microsoft. We discussed Azure stack is a hybrid approach to cloud. Microsoft Azure has developer tools for all platform services targeting web and mobile, Internet of Things, Microservices, and Data + analytics, Identity management, Media streaming, High Performance Compute and Cognitive services. These platform services all utilize core infrastructure of compute, networking, storage and security. The Azure resource manager has multiple resources, role based access control, custom tagging and self-service templates.The compute services are made more agile with the offerings from a VM infrastructure, VM scale sets infrastructure, Container service orchestration and batch/Job orchestration.
 The compute
 services are made more agile with the offerings from a VM infrastructure, VM scale sets infrastructure, Container service orchestration and batch/Job orchestration.  Azure involves a lot of fine grained loosely coupled micro services Microservices can be stateful or stateless and can be deployed in a multi-cloud manner.  

The Paas platform of Azure can span Azure and AWS both. It can occupy on-Premise, GCP and others as well Containers, serverless and Microservices are different forms of computing. A container packages an exe or a jar. Serverless dictates the operational/cost model.  Microservices are a 3-tier  model involving a thin client SOA and pub/sub and provides a development architecture. The core compute is provided by Batch, Container Service, VM Scale sets and virtual machines in that order. The Platform is provided by Azure functions, App Service, Service fabric and Cloud Services
We now look at data platform. The purpose of this platform to interpret Data to gain intelligence so that it can guide actions. This transformation from data to actions is facilitated by layers of information management, big data stores, machine learning and analytics, and Intelligence services as well as dashboards and visualizations.  The data comes from sensors and devices, applications,  and other data sources.  The information management layer aggregates this data with data factory, data catalog, and event hubs.  The big data stores work with data lake store and SQL data warehouse.  Machine learning and analytics involves all insight applications such as those for machine learning, data lake analytics, HDInsight and stream analytics.
The Intelligence layer comprises of Cognitive services, Bot framework, and Cortana. The dashboard  usually involves Power BI.
The consumers for this data transformation to actions are people as well as apps and automated systems.
Azure Networking is divided into regions that include inside the Azure region, connecting Azure regions, and geographic reach and internet ecosystems. Its the latter two that the internet exchange provider spans.  The connections to the Azure region are made over Software defined WAN and optical networks or advanced MPLS services. The networking inside the Azure region already comes with security, performance, load balancing, virtual networks, cross-premises connectivity.
Azure now comes with accelerated networking that provides upto 25Gbps of throughput and reduces network latency up to 10x. Without accelerated networking, the policies were applied in software in the host. With accelerated networking, the policies are applied in hardware accelerators
#codingexercise
Check whether a BST has a dead end.
A dead end is an element after which we cannot insert any more element. It is a value x such that x+1 and x-1 exist. The BST contains positive integer values greater than zero which makes the value 1 an exception
bool HasDeadEnd(Node root)
{
if (root == null) return false;
var all = new List<Node> ();
ToInOrderList(root, ref all);
var leaves = GetLeaves(all); // during traversal, check if left and right are null for selecting a leaf
foreach (var leaf in leaves)
{
if (all.Contains(leaf.data -1) && all.Contains(leaf.data+1))
{
return true;
}
}
return false;
}
}
void ToInOrderList(Node root, ref List<node> all)
{
if (root == null) return;
ToInOrderList(root.left, ref all);
all.Add(root);
ToInOrderList(root.right, ref all);
}
List<Node> GetLeaves(List<Node> all)
{
all.Select(x => x.left == null && x.right == null).ToList();
}
Alternatively
void FindLeaves(Node root, ref List<node> leaves)
{
if (root == null) return;
FindLeaves(root.left, ref leaves);
if(root.left == null && root.right == null)
leaves.Add(root);
FindLeaves(root.right, ref leaves);
}

The order in which the leaves are enumerated depends on the order in which the traversal is done.