Cluster computing

Sunday, August 15, 2021

Zone-Down simulation

Introduction:

Zone down is a drill for Availability zones in public cloud computing. Availability Zones are massive assets for a public cloud. They build resilience and availability for applications and services. It comprises of multiple datacenters and are almost indistinguishable from one another. A region may have three availability zones for redundancy and availability and each zone may host a variety of cloud computing resources – small or big. Since there are several stakeholders in an availability zone, it is a challenge to see what happens when one fails. Even the method to finding that information is quite drastic which involves powering down the zone. There are a lot of network connections to and from cloud resources, so it becomes hard to find an alternative. This article proposes a solution based on diversion of traffic between zones.

Proposal:

This is a multi-tiered approach to the problem statement.

Step 1. The first tier is about the use of virtual network around the PaaS services from the cloud from which resources have been provisioned under a subscription. This can be done with the help of Azure Private Link which helps connect resources over an internal subnet so that we don’t have to access them publicly. Setting up this virtual network is critical to the stated goal of isolating zones which would otherwise have participated in a shared zone-redundancy. If each zone could be put in a separate virtual network and if the networks do not share anything, it can be assumed that traffic flowing to one network only will prevent the utilization of the other zones, thus simulating a network zone-down. The internal traffic goes over the Microsoft backbone network and does not have to traverse the internet.

The following PowerShell command describes the steps to do this:

New-AzPrivateLinkService

-Name <String>

-ResourceGroupName <String>

-Location <String>

-LoadBalancerFrontendIpConfiguration <PSFrontendIPConfiguration[]>

-IpConfiguration <PSPrivateLinkServiceIpConfiguration[]>

Which is used in the steps like this.

$vnet = Get-AzVirtualNetwork -ResourceName 'myvnet' -ResourceGroupName 'myresourcegroup'

$subnet = $vnet | Select-Object -ExpandProperty subnets | Where-Object Name -eq 'mysubnet'

$IPConfig = New-AzPrivateLinkServiceIpConfig -Name 'IP-Config' -Subnet $subnet -PrivateIpAddress '10.0.0.5'

$publicip = Get-AzPublicIpAddress -ResourceGroupName 'myresourcegroup'

$frontend = New-AzLoadBalancerFrontendIpConfig -Name 'FrontendIpConfig01' -PublicIpAddress $publicip

$lb = New-AzLoadBalancer -Name 'MyLoadBalancer' -ResourceGroupName 'myresourcegroup' -Location 'West US' -FrontendIpConfiguration $frontend

New-AzPrivateLinkService -Name 'mypls' -ResourceGroupName myresourcegroup -Location "West US" -LoadBalancerFrontendIpConfiguration $frontend -IpConfiguration $IPConfig

Step 2. The second tier is an application gateway that onboards user traffic to one of the virtual networks

$ResourceGroup = New-AzResourceGroup -Name "ResourceGroup01" -Location "West US" -Tag @{Name = "Department"; Value = "Marketing"}

$Subnet = New-AzVirtualNetworkSubnetConfig -Name "Subnet01" -AddressPrefix 10.0.0.0/24

$VNet = New-AzVirtualNetwork -Name "VNet01" -ResourceGroupName "ResourceGroup01" -Location "West US" -AddressPrefix 10.0.0.0/16 -Subnet $Subnet

$VNet = Get-AzVirtualNetwork -Name "VNet01" -ResourceGroupName "ResourceGroup01"

$Subnet = Get-AzVirtualNetworkSubnetConfig -Name "Subnet01" -VirtualNetwork $VNet

$GatewayIPconfig = New-AzApplicationGatewayIPConfiguration -Name "GatewayIp01" -Subnet $Subnet

$Pool = New-AzApplicationGatewayBackendAddressPool -Name "Pool01" -BackendIPAddresses 10.10.10.1, 10.10.10.2, 10.10.10.3

$PoolSetting = New-AzApplicationGatewayBackendHttpSettings -Name "PoolSetting01" -Port 80 -Protocol "Http" -CookieBasedAffinity "Disabled"

$FrontEndPort = New-AzApplicationGatewayFrontendPort -Name "FrontEndPort01" -Port 80

$PublicIp = New-AzPublicIpAddress -ResourceGroupName "ResourceGroup01" -Name "PublicIpName01" -Location "West US" -AllocationMethod "Dynamic"

$FrontEndIpConfig = New-AzApplicationGatewayFrontendIPConfig -Name "FrontEndConfig01" -PublicIPAddress $PublicIp

$Listener = New-AzApplicationGatewayHttpListener -Name "ListenerName01" -Protocol "Http" -FrontendIpConfiguration $FrontEndIpConfig -FrontendPort $FrontEndPort

$Rule = New-AzApplicationGatewayRequestRoutingRule -Name "Rule01" -RuleType basic -BackendHttpSettings $PoolSetting -HttpListener $Listener -BackendAddressPool $Pool

$Sku = New-AzApplicationGatewaySku -Name "Standard_Small" -Tier Standard -Capacity 2

$Gateway = New-AzApplicationGateway -Name "AppGateway01" -ResourceGroupName "ResourceGroup01" -Location "West US" -BackendAddressPools $Pool -BackendHttpSettingsCollection $PoolSetting -FrontendIpConfigurations $FrontEndIpConfig -GatewayIpConfigurations $GatewayIpConfig -FrontendPorts $FrontEndPort -HttpListeners $Listener -RequestRoutingRules $Rule -Sku $Sku

An application gateway can be used with both public and private addresses. It comprises of Frontend IP addresses, Listeners, Request routing rules, HTTP settings and backend pools. A backend pool routes request to backend servers, which serve the request. Backend pools can contain:

NICs

Virtual machine scale sets

Public IP addresses

Internal IP addresses

FQDN

Multitenant backends (such as App Service)

This allows an Application Gateway to communicate with instances outside of the virtual network. Application Gateway backend pool members aren't tied to an availability set. An application gateway can communicate with instances outside of the virtual network that it's in. As a result, the members of the backend pools can be across clusters, across datacenters, or outside Azure, if there's IP connectivity.

With the use of internal IPs as backend pool members, virtual network peering or a VPN gateway must be used. Virtual network peering is supported and beneficial for load-balancing traffic in other virtual networks. Different backend pools for different types of requests. For example, one backend pool for private network traffic, and then another backend pool for public IP members.

We don’t require more than one load balancer or a load balancer per availability zone.

Step 3: the use of a traffic manager to switch between the traffic to application gateway for experiment purposes and to the default where the traffic is not diverted and reaches the public endpoints. The traffic manager is not a replacement for application gateway which is required but it facilitates cases where there is a designated public IP address destination for the undiverted traffic and the application gateway ip address for the diverted traffic.

https://1drv.ms/u/s!Ashlm-Nw-wnWzWDDWMXCinJgxN0Q

Conclusion:

The use of a multi-tiered approach is needed only because the virtual network is internal, and the customer traffic is external, but diversion of traffic is a simple and elegant solution for zone-drill power statement.

Saturday, August 14, 2021

Azure Cognitive Search 

This article is a continuation of the series of articles starting with the description of SignalR service. In this article, we begin to discuss Azure cognitive search service aka Azure Search, after the last article on Azure Stream Analytics.   

Azure Cognitive Search differs from the Do-It-Yourself techniques in that it is a fully managed search-as-a-service, but it is primarily a full-text search. It provides a rich user experience with searching all types of content including vision, language, and speech. It provides machine learning features to contextually rank search results. It is powered by deep learning models. It can extract and enrich content using AI-powered algorithms. Different content can be consolidated to build a single index. 

The search service supports primarily indexing and querying. Indexing is associated with the input data path to the search service. It processes the content and converts them to JSON documents. If the content includes mixed files, searchable text can be extracted from the files. Heterogeneous content can be consolidated into a private user-defined search index. Large amounts of data stored in external repositories including Blob storage, Cosmos DB, or other storage can now be indexed.  The index can be protected against data loss, corruption, and disasters via the same mechanisms that are used for the content.  The index is also independent of the service, so another can read the same service if one goes down.  

We evaluate the features of the Azure Cognitive Service next.

The indexing features of assured cognitive search include a full-text search engine, but assistant storage of search indexes integrated AI and API and tools. The data sources for the indexing can be arbitrary if the more of transferring data is a JSON document. Indexers automate data transfer from these data sources and send it to searchable content in the primary storage. Connectors help the indexers with the data transfer and are specific to the data sources such as Azure SQL databases cosmos DB or Azure BLOB storage. Complex data types and collections allow us to model any type of JSON data structure within a search index. The use of collections and complex types helps with one too many and many too many mappings. The analyzers can be used for linguistic analysis of the data ingested into the indexes.

The standard Lucene analyzer is used by default, but it can be overridden with a language analyzer, or a custom analyzer, or one of many predefined analyzers which produce tokens used for search. The AI processing for image and text analysis can be applied to an indexing pipeline at the time of extracting text information some examples of built-in skills include optical character cognition and key phrase recognition. It can also be integrated with azure machine learning authored skills

The indexing pipeline also generates a knowledge store. Instead of sending tokenized terms to an index, it can enrich documents and send them to a knowledge store. This store could be native to Azure in the form of BLOB storage or table storage. The purpose of the knowledge store is to support downstream analysis for processing. With the availability of a separate knowledge store, all analysis and reporting stacks can now be decoupled from the indexing pipeline.

Another feature of the indexing pipeline is the cached content. This limits the processing to just the documents that are changed by specific edits to the pipeline. Most usages read from the cache.

Query pipeline also has several features to enhance the analysis from the Lucene search store. these include free-form text search, relevance, and geo-search. The freeform text search is the primary use case for queries. The simple syntax might include logical operators, phrase operators, suffix operators, and precedence operators, and others. Extensions to this search could include proximity search, term boosting, and regular expressions. Simple scoring is a key benefit of this search. A set of scoring rules is used to model relevance for the documents. These rulesets can be built using tags for personalized scoring based on customer search preferences.