Cluster computing

Saturday, June 19, 2021

Draining Service Bus

Introduction: Service Bus is an Azure public cloud computing resource that acts as a message broker between publishers and subscribers. As a cloud resource, it can scale to arbitrary loads and become mission-critical. Its availability is improved by providing redundancy across regions so that when one goes down, it can failover to another. It comprises a variety of message holders such as Queues, Topics, Relays, and Event Hub listed within a namespace. They differ primarily in how they are used. For example, a queue enables one producer and consumer to send and receive ordered messages. A Topic allows a subscription so that many subscribers can receive messages. There can be several of each type of Service Bus entity and millions of messages in transit. This article describes the proper way to failover from one SB instance to another.

Description: The availability of the Service Bus is further improved by its deployment to multiple low-latency availability zones in a geographical region. While the deployment across regions allows different instances to be provisioned, the deployments within a region across multiple availability zones are for the same instance. The service bus resource does not allow in-place enablement of zone redundancy to improve availability which requires some steps to be taken to preserve the structure and content of the original instance. There are no options to transfer both the structure and content at the same time and the only way to do that is to first replicate the structure on a target instance and then copy the data over after that. This happens with the help of the logic that enumerates them and copies them one by one to the destination. The same logic applies to data migration. This control and data plane duplication can be custom-written into a single program, but it is better to leverage the built-in features of the resource that support the redundancy based on replicating the entities via internal automation. The steps for this migration can be listed as follows: Step 1: Create a new Premium Stock Keeping Unit (SKU) namespace. Step 2: Pair the source and destination namespaces with each other. Step 3: Sync or copy over the entities from the source to the destination namespace. Step 4: Commit the migration. Step 5: Drain entities from the source namespace using the post-migration name of the namespace Step 6: Delete the source namespace. Pairing a source namespace and destination namespace automatically copies over all the SB entities from the source namespace to the destination namespace. It is an in-build feature of the pairing mechanism of the Service Bus. The only catch in that replication is that pairing must be across regions and if we want to have another instance in the same region, we perform the structure and content migration to a new instance in a different region, break the pairing and then create a new pairing between that instance and a third instance back in the original region. Pairing does not replicate the messages that may still be held in the SB entities of the source namespace during the migration and just before it is completed. These messages must be drained. Each of the SB entities is enumerated and their messages read and copied to the corresponding entity in the destination namespace. There are options in the programmability features of this Azure resource to automate the enumeration and transfer of messages. However, it is also possible to avoid writing this code. In such a case, a maintenance window must be in effect during the migration and the following steps will need to be taken. The sender applications are stopped. The receiver applications will process the messages currently in the source namespace and drain the queue. The queues and subscriptions in the source If the source namespace is empty, the migration steps listed earlier are performed. When the migration steps are complete, the sender applications may be restarted. The senders and receivers will now automatically connect with the destination namespace with the help of the alias that was set up for post-migration handling of the traffic from senders. This completes the structure and content replication of the Service Bus.

Friday, June 18, 2021

Azure monitoring continued...

Azure Monitoring also performs continuous monitoring which refers to processes and tools for monitoring each phase of the DevOps and IT operations lifecycles. It helps to continuously ensure the health, performance and reliability of the application and the infrastructure as it moves from deployment to production. It builds on Continuous Integration and Continuous deployment which are ubiquitously embraced by organizations for software development. Azure Monitoring is a unified monitoring solution that provides transparency to the application, runtime, host and cloud infrastructure layers. As a continuous monitoring tool, Azure Monitor allows gates and rollback of deployments based on monitoring data. Software releases to the services hosted in the cloud and have very short software development cycles and must pass through multiple stages and environments before it is made public. Monitoring data allows any number of environments to be introduced without sacrificing the controls for software quality and gated release across environments. The data not only allows thresholds to be set but also alerts so that appropriate action may be taken. As the software makes its way to the final production environment, the alerts increase in levels and become more relevant and useful for eliminating risks from production environment.

It may be argued that tests and other forms of software quality control achieve that as the software goes through the CI/CD pipeline. While this is true, the software quality is enhanced by monitoring data because it is not intrusive or vulnerable to flakiness that many tests are prone to in different environments. The monitoring data, its visualization with dashboards need to be set only once even as the code and tests change over time. The investments in continuous monitoring and its implications boost the planning and predictability of software releases.

Monitoring has tremendous breadth and depth of scope, so a question about cost may arise. It can monitor individual Azure resources and it can monitor those resources for their availability, performance and operation. Some monitoring data is collected by default and the monitoring data platform can be tapped to get more data from all participating monitoring agents. There is no cost associated with collecting, exporting and analyzing data by default. Costs might be associated with storage if a storage account is used or with ingestion if a workspace is used or with streaming if Azure Event Hubs are used. Monitoring involves a cost when running a long query, creating a metric or log query alert rule, sending a notification from any alert rule and accessing metrics through the API. At the resource level, the resource logs and platform metrics are automatically collected. At the subscription level, the activity log is automatically collected.

Thursday, June 17, 2021

Azure Monitoring continued...

The access to the data is controlled by the storage access service key also called a blob token that is issued at the time of writing the blob so that the destination can use that token to import the data and handle the single-cast or multi-cast as appropriate to event handlers or appropriate Kusto Cluster. Data copying and rewrite is avoided by merely exchanging the payload information and blob tokens with the delivery service absorbing the process of fetching the permissions from GCS on behalf of the customer.

The role of the distributed graph might be standing out in the form of a question at this point. It is a service that is used for searching logs and for transformations. It consists of a front-end service and a backend service with each individual component within the FE and BE cloud services as individual micro-services performing a specific task. The front-end service allows the customers to set up query conditions such as job scope and interval period.

All the monitoring services are region-bound and can be repeated in other regions. Availability within the region such as for disaster recovery purposes requires the use of availability zones. The backend service merely schedules the workers for the appropriate handling of the logs to the customer’s storage account.

Many miscellaneous activities are specific to the data and whether the data is logs or metrics such as scrubbing, logs to metrics transformations, normalization, and uploading which are handled by dedicated services and serve to enhance the pipeline described so far. The monitoring architecture is generic and always requiring queues, blobs, collections, schedulers, pub-subs, producer-consumers accounts, analysis and reporting stacks and their configurations.

Most of the resources for Azure monitoring are region scoped. This enables Azure Monitoring to be setup in each region. Some shared data and resources across these regions may exist in a dedicated region which would power use cases of monitoring via the Azure portal.

Wednesday, June 16, 2021

Let us take a closer look at how the monitoring data is gathered and analyzed internally within the cloud. The architecture behind Azure Monitoring is a diagnostics pipeline. This pipeline consists of an ingestion gateway, a delivery service, a distributed graph, a normalizer service and scrubber, logs to metrics converter, and an uploader to a global database. The pipeline support ingestion, streaming, transformations, and querying. This is its hallmark. All these paths are supported end-to-end via the pipeline without any interference from each other.

The idea behind the monitoring pipeline is one of queuing and pub-sub mechanisms. Logs and metrics flow from gateways to storage queues, where blobs are listened for, scrubbed, forwarded to event hubs, and uploaded to different destinations such as CosmosDB, Azure Data Lake Storage (ADLS), and delivery services. The rate of flow to the queues can be throttled and the schema hints can be propagated to the storage where the schema and notifications power the analytics. The metrics accumulation in an MDM facilitates the logic for throttling and rate adjustments while the schemas are mostly published and queried via Kusto.

Configurations for different storage containers, queues, and hubs are defined between the collection and the delivery services. These are called Monikers and it is a pairing of Event hub and storage account. The ingestion service is responsible to connect the monitoring agent with its storage account. The use of this service reduces the number of monikers, the number of blob writes to storage, and the complexity of the distributed graph representation. The storage is billed in terms of transactions and what would earlier take hundreds of transactions and blob writes, would require only tens of transactions using the ingestion or ingress service. It can also aggregate the blobs before writing them to the storage account.

The corresponding egress service is the delivery service and can be considered an equivalent of Apache Kafka. It comes with a set of producer and consumer definitions and this pub-sub service operates at the event level. There is an application programmability interface provided for consumers who would like to define the monikers instead of the control on the events. The setting up of monikers determines where and how the data is delivered and the use of monikers reduces the bill in an equivalent way to how the ingress did. The destinations are usually Kusto clusters and event hubs. The delivery service forms the core of the pipeline with agents and ingestion pouring data to storage defined by monikers. At the other end of the pipeline are the event hubs and Kusto clusters.

Collection and Storage have pre-requisites. For example, when virtual machines are created, they automatically have a monitoring agent (MA) installed. This agent reaches out to a collection service with an intent to write and define a namespace. The handshake between the monitor and the agent gives the agent the configuration necessary to direct its data to a destination Moniker which can scale automatically for the storage account.

Unlike the collection and the storage, which are automatically provisioned, the delivery and the paths are set up by the customer using the application programmability interfaces in the extensibility SDK associated with the delivery services. The delivery service then concerns itself merely with the resolving of monikers, the listening on the monikers, the filtering of the events, and its delivery to the Kusto clusters and event hubs. If the destination is unreachable or unavailable, the data is handed off to the snapshot delivery service which reads the delivery service namespaces for retries. The data is never put in memory when the delivery service forwards the data to a cache under a namespace key. The snapshot delivery service acts as the standby destination in place of the unreachable one.

Monday, June 14, 2021

Introduction to Azure Monitoring:

Abstract: This article is an introduction to the programmability aspects of Azure Public cloud monitoring service that is known for its unprecedented scale and flexibility in reporting metrics from the resources hosted on the Azure Public Cloud.

Description: Azure Monitoring helps us maximize the availability of our applications and services hosted on Azure Public Cloud. It is a complete solution for collecting, analyzing, and acting on the telemetry from the Cloud Environment. This monitoring program comprises of an application performance management system called the Application Insights, the host monitoring system called the VM Insights and Container Insights, the Log Analytics solution which allows drill down into the monitoring data, smart alerts, and automated actions which help support operations at scale, and visualizations with dashboard and workbooks. The data collected from this comprehensive solution become part of Azure Monitor Metrics.

Azure Monitoring is not only about metrics, but it is also about logs, and it allows us to gather insights, visualize, analyze, respond, and integrate. The monitoring data platform works for both metrics as well as logs. While events and traces become part of logs, metrics are numerical values that quantify application performance at a given point in time. The metrics store and its visualization with metrics explorer and the log data and its filtering with Log Analytics are just applications dedicated to their respective data. The Azure Monitor uses the Kusto query language that is suitable for simple log queries that also include advanced functionalities such as aggregations, joins, and smart analytics. Kusto benefits from both SQL and Splunk querying practices.

One of the most interesting aspects of Azure Monitoring is that it collects metrics from Applications, Guest OS, Azure resource monitoring, Azure subscription monitoring, and azure tenant monitoring to include the depth and breadth of the systems involved. Alerts and Autoscale help determine the appropriate thresholds and actions that become part of the monitoring stack, so the data and the intelligence are together and easily navigated via the dashboard. Azure Dashboards provide a variety of eye-candy charts that better illustrate the data to the viewers than the results of the query. Workbooks provide a flexible canvas for data analysis and the creation of rich visual reports in the Azure Portal. The analysis is not restricted to just these two. Power BI remains the robust solution to provide analysis and interactive visualizations across a variety of data sources and it can automatically import log data from Azure monitor. Azure Event Hubs is a streaming platform and event ingestion service which permits real-time analytics as opposed to batching or storage-based analysis. APIs from the Azure monitor help with reading and writing data as well as configure and retrieve alerts.

Sunday, June 13, 2021

Networking Techniques Continued ...

Introduction: This is a continuation of the article written about networking technologies in the cloud specifically the Azure public cloud.

Description: Some techniques with networking drive down costs significantly. For example, Switched Embedded Teaming can be used to combine two 10 Gb ports to 20 Gb ports. This is a boost to the capacity and has little or no additional load on the CPU. Such techniques have always been part of the networking industry and its history. Modems that provided Point-to-Point connectivity enabled the bandwidth to be increased by cumulative additions of other modems. Multilink capabilities enabled all the modems to be active at once while the Bandwidth allocation protocol allowed these modems to be aggregated one by one. These techniques allowed IT administrators to extend the service life of the existing infrastructure and hardware so that new equipment could be purchased when the organization was ready instead of when incidents occurred. The overall throughput could be improved so that high-priority applications could get the network access they need. Service levels could be added to the existing default conditions with the help of Quality-of-Service improving protocols that prioritized traffic to and from critical applications. DiffServ and IntServ paradigms boosted the adoption of certain techniques and technologies in favor of others. IntServ and DiffServ are models of providing Quality-of-Service. IntServ stands for Integrated services and DiffServ stands for Differentiated Services. In the IntServ model, QoS is applied on a per-flow basis and addresses business model and charging concerns. Even in mobile phone networks, this is evident when certain billing options are desirable but not possible. In Differentiated services, the emphasis is on scalability, a flexible service model, and simpler signaling. Scalability here means we do not track resources for each of the considerable numbers of flows. Service model means we provide bands of service such as platinum, gold, and silver.

Queuing is also a technique that helps improve service levels. Virtual Machine Multi-Queue on Windows Server improves throughput. A built-in software load balancer can distribute incoming requests where the policies are defined using the network controller. Putting this all together, we have a network controller that sits between a Gateway and a Hyper-V vSwitch that works with a load balancer to distribute traffic to other Hyper-V vSwitches which connect several other virtual machines. The Gateway works with a Hyper-V vSwitch that routes internal as well as external traffic to Enterprise sites and Microsoft Azure using IP Security protocol (IPSec) or Generic Routing Encapsulation (GRE) protocol. The Hyper-V vSwitches in the hybrid cloud is also able to send and receive Layer3 traffic via the gateway which rounds up overall connectivity for the entire Hybrid cloud with limitless expansion internally. This mode of connecting Hybrid cloud with public cloud is here to stay for a while because customers for the public cloud have significant investments in their hybrid cloud and the public cloud cannot migrate the applications, services, devices, and organizations with their workloads without requiring code changes. Networking is probably the easiest technique to allow new ones to be written directly on the public cloud so that traffic may be eased out on to public cloud as investments in the hybrid cloud can be scaled back.

Saturday, June 12, 2021

Azure Software defined networking features:

Introduction: Azure is a public cloud with a portfolio of services. Azure Networking is one of the core services in the portfolio and offers Network-as-a-service functionality. This article discusses some of the main features of this service.

Description: Networking is all about links and communication. It involves layers of protocols, a mix of network topologies, hybrid equipment, naming and resolving mechanisms, access controls and policy specifications, and a variety of management and troubleshooting tools and services. Azure ExpressRoute provides optimal routing for best performance. The default traffic is over the Microsoft Global Network which is often referred to as the cold potato routing. Inter availability zones and Inter regions provide low latency and geographical networking. The traffic routes between Azure and the internet can be determine with routing preference. Azure Load balancer provides high performance with low latency.

There are manageability features that allow the management of on-premises, multi-cloud, 5G, and edge deployments. Connection is provided via Azure Virtual-WAN and the edge and 5G scenarios are enabled with Edge Zones, Edge Zones with Carrier, or private edge zones. These networks are secured by zero-trust based network security which involve segmentation, and the use of Azure WAF and Azure Bastion. There is intelligent threat detection with Azure DDoS protection. Private connectivity is available via Azure Private Link. Azure Network Virtual Appliance and Remote Access Service can provide end to end IP tunneling.

The Network as a service features easy to use scalable service and tools. The traffic is managed via Azure App Gateway and protected via Azure WAF. Azure FrontDoor helps define and monitor global routing. Firewall capabilities are turned on with Azure Firewall. VNet NAT is used to ensure reliable network address translation and can provide outbound connectivity.

Software defined networking is built into each Windows Server. When IT wants the ability to deploy applications quickly, SDN and network controller can be used, and policy can be managed with PowerShell. HyperV, and network controller can be used to create VxLAN overlays which does not require re-assignment of IP addresses. Hybrid SDN gateways can be used to assign and manage resources independently.

There is greater security and isolation of workloads with the use of network security groups and distributed firewall for micro-segmentation. North-South internet traffic and East-West intranet traffic can be established differently. User-defined routing can be configured with service chains can be established with 3^rd party appliances such as firewall, load balancer or content inspection. Cost is driven down by converging storage and network on Ethernet, and activating Remote Direct Memory Access (RDMA)