Cluster computing

Thursday, January 20, 2022

Data Import and Export from a database:

This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing that included the most networking discussions on Azure DNS which is a full-fledged general availability service. This article focuses on data import and export from a database

Data migration is a necessity when archiving or moving from one platform to another, and for redundancy. Since databases have relational data, import and export of data must consider the schema. When only the schema needs to be transferred, all the SQL objects can be serialized as Xml and this is contained in a single DACPAC file. A Data-tier application is a logical database management entity that defines all the SQL server objects. With the data included with the schema, the exported format is a BACPAC file. It can be stored in Azure Blob Storage or on premises.

The advantage of using Data tier application formats is that the DACPAC enables a DBA to list and validate behaviors from different source and targets. It offers a chance to determine failures and data loss in the case of say an upgrade. DAC tools can provide an upgrade plan. A script driven exercise does not offer that visibility. DAC also supports versioning to help the developer who authors it and the DBA who uses it to maintain and manage the database lineage through its lifecyle. The primary use case for a DACPAC is the propagation of an existing database through development, test and production environments or in the reverse direction. The BACPAC is used from a source database to a new database and even on another server.

For an export to be transactionally consistent, there must be no write activity occurring during the export or the data must be exported from a copy which is transactionally consistent. If the BACPAC file is stored in the Azure Blob Storage, then there is a maximum size limit of 200GB. It cannot be exported to Azure Premium storage or to a storage behind a firewall or to an immutable storage. The fully qualified file name must be limited to 128 characters and must exclude specific special characters. If the operation exceeds 20 hours, it may be canceled. User must be a member of the dbmanager role or assigned create database permissions

Azure SQL managed instance does not support exporting a database to a BACPAC file using the Portal or PowerShell. Instead, the SQLPackage or SQL Server management studio must be used. This does not mean that there is no such functionality in Portal or PowerShell. For example, New-AZSqlDatabaseExport and New-AzSqlDatabaseImport are available but large database export or import take a long time and may fail for many reasons. SQLPackage utility is best for scale and performance. Data Migration Service can migrate a database from a SQL Server to an Azure SQL database. Bcp utility is shipped along with SQL server and it can also be used to backup and restore data

Like Export, Import speed can be maximized by providing more and faster resources, scaling the database, and compute size during the import process. It can be scaled down after the import is successful. SQLPackage, Portal and Powershell help with import just as it does for export.

The DAC operations that are supported include extract, deploy, register, unregister, and upgrade. A BACPAC supports primarily import and export operations.

Wednesday, January 19, 2022

This is a continuation of the sample queries written for Azure Public Cloud for diagnostic purposes. The topic was introduced in this article earlier.

Sample Kusto queries:

1) When log entries do not have function names, scopes or duration of calls:

source

| where description Contains "<string-before-scope-of-execution>"

| project SessionId, StartTime=timestamp

| join (source

| where description Contains "<string-after-scope-of-execution>"

| project StopTime=timestamp, SessionId)

on SessionId

| project SessionId, StartTime, StopTime, duration = StopTime - StartTime

| summarize count() by duration=bin(min_duration/1s, 10)

| sort by duration asc

| render barchart

2) Since the duration column is also relevant to other queries later

source | extend duration = endTime – sourceTime

3) When the log entries do not have an exact match for a literal:

source

| filter EventText like "NotifyPerformanceCounters"

| extend Tenant = extract("tenantName=([^,]+),", 1, EventText)

4) If we wanted to use regular expressions on EventText:

source

| parse EventText with * "resourceName=" resourceName ",

totalSlices=" totalSlices:long * releaseTime=" releaseTime:date ")" *

| valid in~ ("true", "false")

5) If we wanted to read signin logs:

source

| evaluate bag_unpack(LocationDetails)

| where RiskLevelDuringSignIn == 'none'

and TimeGenerated >= ago(7d)

| summarize Count = count() by city

| sort by Count desc

| take 5

6) If we wanted to time bucket the log entries:

source

| where Modified > ago(7d)

| summarize count() by bin(Modified, 1h)

| render columnchart

7) If we wanted to derive just the ids:

Source

| where Modified > ago(7d)

| project Id

8) If we wanted to find the equivalent of a SQL query, we could use an example as follows:

EXPLAIN

SELECT COUNT_BIG(*) as C FROM StormEvents

Which results in

StormEvents

| summarize C = count()

| project C

This works even for wild card characters such as:

EXPLAIN

SELECT * from dependencies where type like “Azure%”

Which results in

dependencies

| where type startswith “Azure”

And for extensibility as follows:

EXPLAIN

SELECT operationName as Name, AVG(duration) as AvgD FROM dependencies GROUP BY name

Which results in

dependencies

| summarize AvgD = avg(duration) by Name = operationName

9) If we wanted to process JsonPath, there are KQL Functions that process dynamic objects. For example:

datatable(input :dynamic)

[

dynamic({‘key1’: 123, ‘key2’: ‘abc’}),

dynamic({‘key1’: 456, ‘key3’: ‘fgh’}),

]

| extend result = bag_remove_keys(input, dynamic([‘key2’, ‘key4’]))

The above can also be written with json query language

| extend result = bag_remove_keys(input, dynamic([‘$.key1’]))

which results in

{‘key2’:’abc’}

Tuesday, January 18, 2022

Disaster recovery using Azure DNS and Traffic Manager:

This article focuses on disaster recovery using Azure DNS and Network Traffic Manager. The purpose of disaster recovery is to revive functionality after a severe loss for the application. The level of revival may be graded as unavailable, partially available or fully available. A multi-region architecture provides some fault tolerance and resiliency against application or infrastructure by facilitating a failover. The region redundancy helps achieve failover and high availability but the approaches for disaster recovery might vary from business to business. The following are listed as some of the options.

- Active-passive with cold standby: In this failover solution, the VMs and other applications are running in the standby mode are not active until there is a need for failover.Backups, VM Images and resource manager templates continue to be replicated usually to a different region. This is cost-effective but takes time to complete a failover.

- The active/passive with pilot light failover solution sets up a standby environment with minimal configuration. The setup has only the necessary services running to support only a minimum and critical set of applications. The scenario can only execute minimal functionality, but it can scale up and launch more services to take bulk of the production load if a failover occurs. Data mirroring can be setup with a site-to-site vpn.

- the active passive with warm standby is setup such that it can take up a base load and initiate scaling until all instances are up and running. The solution isn’t scaled to take full production workload, but it is functional. It is an enhancement over the previous approach but short of a full-blown approach.

Two requirements that come from this planning deserve callouts. Firstly, a deployment mechanism must be used to replicate instances, data and configurations between primary and standby environments. The recovery can be done natively or third-party services. Secondly, a solution must be developed to divert network/web traffic from the primary site to the secondary site. This type of disaster recovery can be achieved via Azure DNS, Traffic Manager for DNS or third-party global load balancers.

The Azure DNS manual failover solution for disaster recovery uses the standard DNS mechanism to failover to the backup site. It assumes that both the primary and the secondary endpoints have static IP addresses that don’t change often, an Azure DNS zone exists for both the primary and secondary site and that the TTL is at or below the RTO SLA set in the organization. Since the DNS Server is outside the failover or disaster zone, it does not get impacted by any downtime. The user is merely required to make a flip. The solution is scripted and the low TTL set against the zone ensures that no resolver around the world caches it for long periods. For cold standby and pilot light, since some prewarming activity is involved, enough time must be given before making the flip. The use of Azure traffic manger automates this flip when both the primary and the secondary have a full deployment complete with cloud services and a synchronized database. The traffic manager routes the new requests to the secondary region on service disruption. By virtue of the inbuilt probes for various types of health checks, the Azure Traffic Manager falls back to its rules engine to perform the failover.

Monday, January 17, 2022

This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing that included the most networking discussions on Azure Private Link which is a full-fledged general availability service. A follow-up article discussed troubleshooting the network connectivity issues encountered with the Azure Private Endpoint

This article discusses the troubleshooting of network connectivity problems as encountered with Azure Private Link Service from operations:

1) When users connect to their Platform-as-a-Service aka PaaS services such as Azure Storage, Azure Cosmos DB, and Azure SQL Database using Azure Private Link, they are connecting to the Azure Private Endpoints on their virtual networks. Traffic between the virtual network and the service goes over the Microsoft Backbone network, which eliminates the exposure over the public internet. It also helps deliver those services privately to their customers. It can be enabled behind a standard load balancer for Azure Private Link access. In this case as well, the customers can create a private endpoint inside their virtual network rather than require it to be deployed with the PaaS service and map to it privately. This solution effectively brings those services to the consumer's over private access. By definition, a virtual network and the private link must be in the same region. Use of multiple regions to connect a virtual network is not feasible without some form of regionally peered or globally peered virtual networks. The customer on-premises over VPN or Azure Express circuits can also access these private links but care must be taken to ensure that the workloads are not impacted.

2) When a connectivity problem is encountered, the setup configuration must be checked. The Private Link Center provides an option to do so. The private link is selected to diagnose, and the virtual network and DNS information is ascertained to be correct. The connection state must be in an Approved state. The resource must have connectivity to the virtual network that hosts the private link. The FQDN and the private IP Address must be assigned. If these are correct, the setup is fine, but the data transfer might not be occurring. The Azure Monitor displays the Bytes in or Bytes out metrics for an Azure Private Link. Attempting to connect to the private Link should display the data as flowing in under ten minutes. If this is not the case, the connection from the resource must be checked. The outbound connections page for the resource has a connection troubleshooting option as a Test Connection feature. This can be used with the Network Watcher for troubleshooting the connection. It is preferred to test by Fully Qualified Domain Name aka FQDN for the resource.

3) When a connection problem is identified aside from the private link or its configuration, it would need to be investigated along with the name-resolution, Network Security Groups aka NSGs, or effective routes. Name resolution errors might occur due to DNS settings rather than IP connectivity. The steps to check the DNS settings depends on the use of a private zone or a custom DNS. If a private zone is used, the corresponding DNS zone record must exist. If it does not exist, it must be created. Delegation must be setup properly for the domain DNS to resolve to the child records. If a custom DNS is used, the settings must resolve to the private IP address of the private link. The existing Azure services must already have a DNS configuration to use when connecting to a public ip address. This configuration must be overwritten to connect to the private link.

These are some of the ways in which the private connectivity can be made to work and used effectively as a replacement to public connectivity.

Sunday, January 16, 2022

This article discusses the troubleshooting of network connectivity problems as encountered with Azure Private Endpoints from operations:

1) When users convert their storage accounts or key vaults to use Azure Private endpoints, they are taking it offline from publicly accessible Internet and exposing it internally to their virtual networks. If they have existing usages of their resources that span more than one region, they will not be able to keep up those usages once those resources fall off the publicly vulnerable addresses. This does not impact usages that are all within the virtual network because an Azure private endpoint is a network interface that connects the other resources privately and securely to a private link service. The workloads running on other resources are given an option for private network connectivity as compared to the publicly vulnerable connectivity. This solution effectively brings those services to the virtual network.

By definition, a virtual network and the private endpoint must be in the same region. Use of multiple regions to connect a virtual network is not feasible without some form of regionally peered or globally peered virtual networks. The customer on-premises over VPN or Azure Express circuits can also access these private endpoints but care must be taken to ensure that the workloads are not impacted.

2) When a connectivity problem is encountered, the setup configuration must be checked. The Private Link Center provides an option to do so. The private endpoint is selected to diagnose, and the virtual network and DNS information is ascertained to be correct. The connection state must be in an Approved state. The resource must have connectivity to the virtual network that hosts the private endpoint. The FQDN and the private IP Address must be assigned. If these are correct, the setup is fine, but the data transfer might not be occurring. The Azure Monitor displays the Bytes in or Bytes out metrics for an Azure Private Endpoint. Attempting to connect to the private endpoint should display the data as flowing in under ten minutes. If this is not the case, the connection from the resource must be checked. The outbound connections page for the resource has a connection troubleshooting option as a Test Connection feature. This can be used with the Network Watcher for troubleshooting the connection. It is preferred to test by Fully Qualified Domain Name aka FQDN for the resource.

3) When a connection problem is identified aside from the private endpoint or its configuration, it would need to be investigated along with the name-resolution, Network Security Groups aka NSGs, or effective routes. Name resolution errors might occur due to DNS settings rather than IP connectivity. The steps to check the DNS settings depends on the use of a private zone or a custom DNS. If a private zone is used, the corresponding DNS zone record must exist. If it does not exist, it must be created. Delegation must be setup properly for the domain DNS to resolve to the child records. If a custom DNS is used, the settings must resolve to the private IP address of the private endpoint. The existing Azure services must already have a DNS configuration to use when connecting to a public endpoint. This configuration must be overwritten to connect to the private endpoint.

These are some of the ways in which the private connectivity can be made to work and used effectively as a replacement to public connectivity.

Saturday, January 15, 2022

This is a continuation of the sample queries written for Azure Public Cloud for diagnostic purposes. The topic was introduced in this article earlier.

Sample Kusto queries:

1) When log entries do not have function names, scopes or duration of calls:

source

| where description Contains "<string-before-scope-of-execution>"

| project SessionId, StartTime=timestamp

| join (source

| where description Contains "<string-after-scope-of-execution>"

| project StopTime=timestamp, SessionId)

on SessionId

| project SessionId, StartTime, StopTime, duration = StopTime - StartTime

| summarize count() by duration=bin(min_duration/1s, 10)

| sort by duration asc

| render barchart

2) Since the duration column is also relevant to other queries later

source | extend duration = endTime – sourceTime

3) When the log entries do not have an exact match for a literal:

source

| filter EventText like "NotifyPerformanceCounters"

| extend Tenant = extract("tenantName=([^,]+),", 1, EventText)

4) If we wanted to use regular expressions on EventText:

source

| parse EventText with * "resourceName=" resourceName ",

totalSlices=" totalSlices:long * releaseTime=" releaseTime:date ")" *

| valid in~ ("true", "false")

5) If we wanted to read signin logs:

source

| evaluate bag_unpack(LocationDetails)

| where RiskLevelDuringSignIn == 'none'

and TimeGenerated >= ago(7d)

| summarize Count = count() by city

| sort by Count desc

| take 5

Friday, January 14, 2022

Virtual Network gateways in availability zones:

VPN and ExpressRoute gateways can be deployed to Azure Availability Zones. Previously, they were deployed to regions but now we have the ability to deploy them to the zones within the region. On one hand this improves the resiliency, scalability, and higher availability for virtual network gateways and on the other hand it opens more opportunities for the use of the gateways particularly with Azure traffic manager. Deploying gateways in Azure Availability Zones physically and logically separates gateways within a region, while protecting the on-premises network connectivity to Azure from zone-level failures. By deploying zonal gateways to each of the three zones and spanning a traffic manager over the gateways, we can now route traffic with zone isolation. This helps with availability zone down simulations. A use of TrafficManager to divert traffic was described earlier in this article: https://1drv.ms/w/s!Ashlm-Nw-wnWzhVd4TIY70gOs48M?e=ma9y5q

When we deploy It across availability zones, we can use zone-redundant virtual network gateways. This adds zone resilience to mission critical scalable services. Zone-redundant and Zonal gateways both rely on the Azure public IP resource standard SKU. The public IP address created using the standard public IP Sku, the behavior depends on whether the gateway is a VPN gateway, or an ExpressRoute gateway. Two gateway instances will be deployed in any two out of three availability zones that provide zone redundancy for a VPN gateway. All three zones can be spanned by an ExpressRoute gateway.

This can be compared to a zonal gateway where all the gateway instances will be deployed in the same zone that is specified by the user. The zones are identified by the numerals 1,2, or 3 and there can be upto three zones within a region. The public IP address must be created using the standard public IP SKU

When a regional gateway is deployed with a Basic public IP SKU, the gateway does not have zone redundancy built into it. Instead, when the gateways are deployed with zone redundancy across availability zones, each availability zone is a different fault and update domain. This makes the gateway more reliable, available and resilient to zone failures.

The Azure portal can be used to deploy the SKUs but the SKUs will be seen only in those regions that have availability zones. These gateways must be created new. They cannot be changed, migrated or upgraded from existing gateways to zone-redundant or zonal gateways. Co-existence of both VPN and ExpressRoute gateways in the same virtual network is supported but a /27 IP address range must be reserved for the gateway subnet.

his is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing that included the most recent discussion on Azure VPN Gateway which is a full-fledged general availability service.