Cluster computing

Tuesday, August 8, 2023

These are some more additions to the common errors faced during the authoring and deployment of Infrastructure-as-Code aka IaC artifacts along with their resolutions:

First, resources might pass the identifier of one to another by virtue of one being created before the other and in some cases, these identifiers might not exist during compile time. For example, the code that requires to assign an rbac based on the managed identity of another resource might not have it during compile time and only find it when it is created during execution time. The rbac IaC will require a principal _id for which the managed identity of the resource created is required. This might require two passes of the execution – one to generate the rbac principal id and another to generate the role assignment with that principal id.

The above works for newly created resources with two passes but it is still broken for existing resources that might not have an associated managed identity and the rbac IaC tries to apply a principal id when it is empty. In such cases, no matter how many times the role-assignment is applied, it will fail due to the incorrect principal id. In this case, the workaround is to check for the existence of the principal id before it is applied.

A second type of case occurs when the application requires ip address to be assigned for explaining the elaborate firewall rules required based on ip address value rather than references and the ip address is provisioned in the portal before the IaC is applied. This IaC then requires to import the existing pre-created ip address into the state so that the IaC and the state match.

Third, there may be objects in the Key Vault that were created as part of the prerequisites for the IaC deployment and now their ids need to be reconciled with the IaC. Again, the import of that resource into the state would help the IaC provider to reconcile the actual with the expected resource.

Fourth, the friendly names are often references to actual resources that may have long been dereferenced, orphaned, changed, expired, or even deleted. The friendly names, also called keys, are just references and hold value to the author in a particular context but the same author might not guarantee that the moniker is in fact consistently used unless there are some validations and review involved.

Fifth, there are always three stages between design and deploy of Infrastructure-as-code which are “init”, “plan” and “apply” and they are distinct. Success in one stage does not guarantee success in the other stage especially holding true between plan and apply stages. Another limitation is that the plan can be easily validated on the development machine but the apply stage can be performed only as part of pipeline jobs in commercial deployments. The workaround is to scope it down or target a different environment for applying.

Sixth, the ordering and sequence can only be partially manifested with corresponding attributes to explain dependencies between resources. Even if resources are self-descriptive, combination of resources must be carefully put-together by the system for a deterministic outcome.

These are only some of the articulations for the carefulness required for developing and deploying IaC.

Reference to previous articles on Infrastructure-as-Code included.

Sunday, August 6, 2023

Azure Database Migration Services – Errors and Resolutions:

While the previous articles introduced the Azure Database Migration Service for the purpose of migrating databases between MySQL servers, this article discusses a specific scenario and the errors and resolutions encountered.

The source server is an Azure MySQL Single Server instance. It is set up to allow access to all Azure Services. There are no virtual network rule customizations and the default access over the internet works.

The destination server is an Azure MySQL flexible server. It has been set up to allow access privately. There is a virtual network and subnet to which it is connected and registered in the private DNS for lookup by its name with the privatelink.mysql.database.azure.com. It does not connect directly over the internet.

The size of the database is small so the duration of the migration can be assumed to be in the order of a few minutes. There is only one database with a few tables.

When the database copy activity is attempted to be created in a new migration project on the console of the Database Migration Service, it takes the source server parameters and validates the connectivity but fails to do so for the destination server. The DMS service is deployed to an independent virtual network without a NAT gateway.

If the flexible server had been set up to allow public access and with the allowed access to the DMS service, the migration would have proceeded smoothly. Since the connectivity is private, the DMS service must make some adjustments and there will be errors encountered.

The first error is encountered by virtue of peering at the virtual networks between the DMS service and the destination server – a step necessary to allow traffic both ways over the private networks. Since the default subnet of both resources might begin with 10.0.0.0/16 CIDR, the peering cannot be made. It is necessary to create the DMS Service in a subnet with the address space different from the one that the destination MySQL server uses. Typically, this is an afterthought and one that escapes attention at the time of creation of the DMS service instance or the MySQL server.

The second error is encountered once the address space has been created distinct for both the subnets of the service and the MySQL flexible server. This error comes from the fact that the creation of a non-default address space of a virtual network/subnet for the service prevents the service from communicating to the source server with an error that says something like “UnauthorizedAccessException - 40103: Invalid authorization token signature, Resource:sb://<somename>.servicebus.windows.net:<sometopic>. Althought this error is cryptic and complains about the way a signature was created for an api call, it goes away unexplicably when the service is switched to the default address space.

This leads to a catch-22 situation of not having simultaneous access to both the source and the destination. The resolution in this case is to split the migration from source to an interim mysql server with one service that uses the default address space and then using another service to use the interim as the source and preferably with the interim in the same address space as the subnet that the second services uses. The other option is to keep it public for the duration of migration and then take it private. Or the DMS service could be instantiated in the same virtual network as the destination private MySQL Flexible Server albeit in a separate subnet with a NAT gateway to connect to the source server.

Saturday, August 5, 2023

Comparison of MySQL database migration techniques:

MySQL databases are popular on-premises as well as in the public clouds with various development teams. Routinely, they find themselves in situations where they need to migrate their databases across servers.

There are two techniques for doing so for the MySQL databases hosted in the public cloud. The first involves the native support from MySQL server instances in the form of mysqldump utility and the second involves the public cloud capabilities to migrate the data.

Between these options, the choices are usually based on habits rather than leveraging their strengths and avoiding their weaknesses. This article provides a place for both as shown.

The azure data migration service instance comes with the following.

Pros:

One instance works across all subscriptions.

Can transfer between on-premises and cloud and cloud to cloud.

Pay-per-use billing.

Provides a wizard to create data transfer activity.

Cons:

Limited features via IaC as compared to the portal but enough to get by.

Not recommended for developer instances or tiny databases that can be exported and imported via mysqldump.

binlog_expire_logs_seconds must be set to non-zero value on source server.
Supports only sql login

The steps to perform the data transfer activity between the source and destination MySQL servers involves:

Create a source mysql instance mysql-src-1 in rg-mysql-1

Create a database and add a table to mysql-src-1

Create a destination mysql instance mysql-dest-1 in rg-mysql-1

Deploy the DMS service instance and project.

Create and run an activity to transfer data.

Verify the data.

The mysqldump utilities, on the other hand, prepare the SQL for replay against the destination server. All the tables of the source database can be exported.

Pros:

1. The SQL statements fully describe the source database.

2. They can be edited before replaying

3. There are many options natively supported by the database server.

Cons:

1. It acquires a global read lock on all the tables at the beginning of the dump.

2. If long updating statements are running when the Flush statement is issued, the MySQL server may get stalled until those statements finish

When using mysqldump, it might be better to leverage the MySQL Shell Dump Utilities which provide parallel dumping with multiple threads, file compression and progress information display.

It is best to determine the size of the database before choosing the options:

SELECT table_schema "DB Name",

ROUND(SUM(data_length + index_length) / 1024 / 1024, 1) "DB Size in MB"

FROM information_schema.tables

GROUP BY table_schema;

Friday, August 4, 2023

The previous article covered a few errors and resolutions encountered when deploying the Azure Application Gateway. This includes a few more.

First, the redirected path in the redirect configuration is routinely suffixed with a “/” character but this is incorrect when the settings in the same configuration have directives to include the path and the query parameters of the incoming request. With the occurrence of more than one path separators, the url runs the risk of not resolving correctly. Although, this seems trivial, the IaC looks normal with the trailing path separator in the redirect url and often escapes attention until the deployment occurs in the production environment.

Second, the listener must be setup as basic and not multi-site if the backend pool members are not multi-tenant. No matter how many pool members are added this holds true and this is reflected in the hostnames attribute of the listener in the IaC as well. There does not need to be any mention of hostnames because that attribute often marks the listener to be multi-site when the intent might have been to just list the DNS names.

Third, the basic http settings of the application gateway might require the path to be overridden for individual backend targets of the path-based routing. Typically, this value is “/”, the trailing path separator, and it is required to get a non 404 responses from the backend target. But one of the errors cited in the previous article suggests the contrary for its resolution. This calls for creating two http settings with one having the path override and another without so that the application gateway can deploy the default as the one without the path override and individual path-based routing backend targets to utilize the path-based override.

It is important to note that the override works differently depending on the path rule and the override pattern. For example, if the path rule is /fn/* then an override pattern of “/” is required. On the other hand if the path rule is /fn*, then an override pattern of “/” should be avoided unless there is a substitution for “fn” in the path.

Fourth, the rewrite configuration section of the IaC for an application gateway could include a condition-action statement but if care is not taken to exclude unnecessary attributes, it results in an unintended action statement aside from the condition-action statement. This misconfiguration can be caught by reviewing the plan and deploying with just what is needed for the condition-action statement in which case the application gateway will deploy and behave correctly. Common unintended side-effects involve rerouting after the rewrite so that it enters the route evaluation again.

Lastly, the redirects are client facing and rewrites can be backend facing so they must be used appropriately.

Thursday, August 3, 2023

Azure Application Gateway is a sophisticated resource capable of being a firewall, reverse proxy, http listener, router and many more. Among the salient ways in which it is used for directing traffic to backend app services, path-based routing is one of the widely. But practitioners often encounter errors that they might quickly blame it on the gateway and look for documentation to overcome them. There’s quite a few of them and due to the high number of configuration variations involving web traffic, it is not easy to find the right fix for specific error codes.

This article talks about two such error codes that are often considered to be time taking to resolve but the resolutions are explained here.

First, is the error encountered when expanding `url_path_map`. There is a conflict between `backend_address_pool_name` and `redirect_configuration_name` (back-end pool not applicable when redirection specified)

Every url path can be routed in one of two ways, it can be routed to a backend pool member, or it can be redirected to an external location. The directions this traffic takes are exactly the opposite with one going towards the backend and another going towards the client. That is why the same path rule cannot have both specified. In such a case, the resolution is to split the rules to serve the client or the backend. The rules can split the path as well with one targeting say /path/subpath1 and another targeting the remaining as /path/*. There are no exclusions to author the paths so ordering the specific rules before the general rules is helpful. In general, we can have arbitrary path and how we sequence the rules depends on us.

A sample path map would be like this:

url_path_maps = [

{

default_backend_address_pool_name = "default-pool"

default_backend_http_settings_name = "myapps-nonprod-setting"

name = "myapps-nonprod-rule"

path_rules = [

{

backend_address_pool_name = null

backend_http_settings_name = null

name = "fn-demo-7-docs"

paths = [

"/fn-demo-7/docs"

]

rewrite_rule_set_name = null

redirect_configuration_name = "fn-demo-7-appdocs"

{

backend_address_pool_name = "fn-demo-7"

backend_http_settings_name = "myapps-nonprod-setting"

name = "fn-demo-7"

paths = [

"/fn-demo-7/*"

]

rewrite_rule_set_name = "location-header-rewrite"

redirect_configuration_name = null

}

]

}

]

Second, error encountered is called ApplicationGatewayPathOverrideAndUrlModificationNotSupported and comes with the error message: The request routing rule /subscriptions/***/resourceGroups/rg-demo-7/providers/Microsoft.Network/applicationGateways/gwy-demo-7/requestRoutingRules/myapps-nonprod-rule associated with this rewrite action properties.rewriteRuleSets[0].properties.rewriteRules[0].actionSet has the override back-end path switch enabled in the HTTP setting /subscriptions/***/resourceGroups/rg-demo-07/providers/Microsoft.Network/applicationGateways/gwy-demo-7/backendHttpSettingsCollection/myapps-nonprod-setting. Either disable this switch or remove url rewrite action set properties.rewriteRuleSets[0].properties.rewriteRules[0].actionSet.urlConfiguration.

While the attempted resolution is often to remove the backend_http_settings from the url path mappings, the fix is actually quite simple in that it talks about a specific override within that configuration block. As shown with the example, the path override is used to provide one when the incoming path needs to be modified but in this case, that is not required because the rewrite only changes the response headers.

backend_http_settings = [

{

authentication_certificate = []

cookie_based_affinity = "Disabled"

host_name = ""

name = "myapps-nonprod-setting"

path = “/” -> null

pick_host_name_from_backend_address = true

port = 443

probe_name = null

protocol = "Https"

request_timeout = 20

trusted_root_certificate_names = [

"DigiCertGlobalRootG2"

]

}

]

The path override is the “/” which must be unset with null to enable the application gateway to be created.

These are the two errors whose resolutions are distilled from the available online documentation and forums.

Some of the common issues faced during the authoring and deployment of Infrastructure-as-Code aka IaC artifacts can be called out as follows:

First, the IaC provider might not support all the attributes of a resource as that of the resource provider or vice versa if the IaC is declaring attributes as independent of resource providers. They have different development cycles and there might be lag between the catch ups that they do. This might be more conspicuous when the resources are “preview-only” features instead of the mainstream “general-acceptance” offerings.

Second, the syntax and semantics might not have parity even when there are one-to-one mappings between the IaC providers’ attributes and the resource providers’ attributes. For example, the key vault secret id might refer to the resource id, the identifier without the version or the base id of the corresponding guarded secret. In these cases, it would have been helpful if the same name was used for attributes in both places, otherwise some head-scratching is inevitable.

Third, the friendly names are often references to actual resources that may have long been dereferenced, orphaned, changed, expired, or even deleted. The friendly names, also called keys, are just references and hold value to the author in a particular context but the same author might not guarantee that the moniker is in fact consistently used unless there are some validations and review involved.

Fourth, there are always three stages between design and deploy of Infrastructure-as-code which are “init”, “plan” and “apply” and they are distinct. Success in one stage does not guarantee success in the other stage especially holding true between plan and apply stages. Another limitation is that the plan can be easily validated on the development machine but the apply stage can be performed only as part of pipeline jobs in commercial deployments. The workaround is to scope it down or target a different environment for applying.

Fifth, the ordering and sequence can only be partially manifested with corresponding attributes to explain dependencies between resources. Even if resources are self-descriptive, combination of resources must be carefully put-together by the system for a deterministic outcome.

Sixth, there is a state drift that occurs when the resources are changed without updating the IaC. The IaC provider might enforce an overwrite of resources with what’s defined in IaC but the iterative capture of IaC requires consolidation of all changes to the development cycle and this suffers from similar limitations that are rampant with those based on communications without acknowledgments.

Seventh, both the state update and its reconciliation are necessary aspects for the deployment process and consequently occur frequently. Behavior for these stages can only be articulated with a limited set of primitives such as create before delete, prevent from delete, ignore changes and others. The simpler the deployment model by virtue of overwrite, the more complex the process to ensure that everything flows into the IaC.

Eighth, access control declarations such as role assignments and permissions can often number quite large and are fraught with errors. Including them in the IaC without discretion to apply as few and as granular as necessary, can only increase maintenance.

These are only some of the articulations for the carefulness required for developing and deploying IaC.

Tuesday, August 1, 2023

Multi-dimensional optimization:

Introduction: This is a continuation of the previous article on the dynamic walk for optimization. Here we try extending the same optimization based on the stochastic optimization of more than two random variables represented by higher dimension form of equations involving a greater number of random variables. Instead of pair-wise treatment of random variables, we now study optimization involving a vector space also called a Hilbert space.

Description:

The equation representing pair of random variables is always in a quadratic form.

Where A is the matrix, x and b are vectors and c is a scalar constant.

The function is minimized by the solution to A x = b

The solution to the optimization involving these two pairs of random variables is represented by Ax = b which is a linear function.

When we have several random variables, then each random variable contributes a dimension and the simple contour map changes to n-dimensional Euclidean space.

Just as the shortest distance between a point to a line is a perpendicular, and the shortest distance from a point to a plane is an orthogonal, the shortest distance between a point and a subspace must also be orthogonal to the subspace. This is the basis for an optimization principle involving an n-dimensional Euclidean space called the projection theorem. This theorem might not be applicable to a normed space but it is applicable to a Hilbert space. A normed linear space is a vector space having a measure of distance or length defined on it. A Hilbert space is a special form of normed space having an inner product defined which is analogous to the dot product of the two vectors in analytical geometry. Those vectors are orthogonal if their dot product is zero. With the help of the orthogonality to determine the minimum, it is now possible to find the optimum as a minimization problem. The least-squares minimization is a technique that describes this minimization in Hilbert space.

The least squares regression involves the estimation of data at each point which gives a set of equations. If the data had no noise, the resulting set of equations would be written in the form of a matrix equation. Solving a matrix equation is well-known. Since the data may not fit a matrix equation perfectly, the least squares regression transforms the data point to an estimation that is as close to a matrix equation as possible. An estimation function can go through all the data points to determine if there will be a solution. The least squares regression is written as Beta = inverse-of(A-Transpose.A).A-Transpose.Y

An example of a least squares sample program in Python would look like this:

import numpy as np

from scipy import optimize

import matplotlib.pyplot as plt

x = np.linspace(0, 1, 101)

y = 1 + x + x * np.random.random(len(x))

A = np.vstack([x, np.ones(len(x))]).T

y = y[:, np.newaxis]

alpha = np.dot((np.dot(np.linalg.inv(np.dot(A.T, A)), A.T)), y)

print(alpha)

plt.figure(figsize = (10, 8))

plt.plot(x, y, ‘b.’)

plt.plot(x, alpha[0] * x + alpha[1], ‘r’)

plt.show()

Layering of neural networks is a technique that applies the same technique at a higher abstraction but it does not transform a problem from one space to another.