Cluster computing

Sunday, August 13, 2023

Pattern to detect an anomaly:

import numpy as np

import matplotlib.pyplot as plt

from sklearn import svm

from sklearn.datasets import make_blobs

# we create 40 separable points

X, y = make_blobs(n_samples=40, centers=2, random_state=6)

# fit the model, do not regularize for illustration purposes

clf = svm.SVC(kernel="linear", C=1000)

clf.fit(X, y)

plt.scatter(X[:, 0], X[:, 1], c=y, s=30, cmap=plt.cm.Paired)

# plot the decision function

ax = plt.gca()

xlim = ax.get_xlim()

ylim = ax.get_ylim()

# create grid to evaluate model

xx = np.linspace(xlim[0], xlim[1], 30)

yy = np.linspace(ylim[0], ylim[1], 30)

YY, XX = np.meshgrid(yy, xx)

xy = np.vstack([XX.ravel(), YY.ravel()]).T

Z = clf.decision_function(xy).reshape(XX.shape)

# plot decision boundary and margins

ax.contour(XX, YY, Z, colors="k", levels=[-1, 0, 1], alpha=0.5,

linestyles=["--", "-", "--"])

# plot support vectors

ax.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1], s=100,

linewidth=1, facecolors="none", edgecolors="k")

plt.savefig("SOSONEP04PyPlot01.pdf")

Saturday, August 12, 2023

There are different ways to add role-based access control to principals for an Azure subscription. These can be done in the IaC with something such as:

Resource azurerm_role_assignment azure_rbac {

scope = var.scope

role_definition_name = var.role_definition_name

principal_id = var.principal_id

}

The scope can be specific to a resource or a resource group or a subscription and takes the fully qualified identifier for the same.

The role definition name can one of many built-in roles that confers permissions to the principal. For example, this could be Owner, Contributor, Reader, Storage Blob Data Reader and so on.

The principal_id is typically the object id for the user, group or AD entity. If a service principal is used, it must be the corresponding object id of the paired enterprise application otherwise there will be an error message that states “Principals of type Application cannot validly be used in role assignments”.

There are many ways to populate the attributes of the resource definition via different IaC definitions, but the provider recognizes them generically as a role assignment.

The preferences among AD entities for use with deployments is managed identity which can be both system and user defined. The benefit of managed identity is that it can work as a credential as opposed to requiring key-secrets to be issued for an enterprise application.

Some caveats apply to IaC in general for role assignments. For example, the code that requires to assign an rbac based on the managed identity of another resource might not have it during compile time and only find it when it is created during execution time. The rbac IaC will require a principal _id for which the managed identity of the resource created is required. This might require two passes of the execution – one to generate the rbac principal id and another to generate the role assignment with that principal id.

The above works for newly created resources with two passes but it is still broken for existing resources that might not have an associated managed identity and the rbac IaC tries to apply a principal id when it is empty. In such cases, no matter how many times the role-assignment is applied, it will fail due to the incorrect principal id. In this case, the workaround is to check for the existence of the principal id before it is applied.

Friday, August 11, 2023

Problem: Given two integer arrays preorder and inorder where preorder is the preorder traversal of a binary tree and inorder is the inorder traversal of the same tree, construct and return the binary tree.

/ \

9 20

/ \

15 7

Input: preorder = [3,9,20,15,7], inorder = [9,3,15,20,7]

Output: [3,9,20,null,null,15,7]

Example 2:

Input: preorder = [-1], inorder = [-1]

Output: [-1]

Solution:

Node BuildTree(char[] Preorder, char[] InOrder, int index = 0)
{
Node root = new Node();
root.data = PreOrder[index];
root.left = null;
root.right = null;

int inIndex = Arrays,asList(InOrder).indexOf(Preorder[index]);

if ( index+1 < Preorder.Length &&
IsLeftSubtree(Preorder[index+1], inOrder, inIndex) == true)
root.left = BuildTree(Preorder, InOrder, index + 1);

if ( inIndex+1 < InOrder.Length &&
isPredecessor(InOrder[inIndex+1], Preorder, index) == false)
root.right = BuildTree(Preorder, InOrder, Arrays.asList(PreOrder).indexOf(InOrder[inIndex + 1]));

return root;

}

boolean is Predecessor(char c, char[] Preorder, int index)
{
return Arrays.asList(Preorder).indexOf(c) < index;
}

boolean isLeftSubtree(char c, char[] inOrder, int index)
{
Arrays.asList(Inorder).indexOf(c) < index;
}

Wednesday, August 9, 2023

This is a continuation of the previous articles on IaC deployment errors and resolutions. In this section, we talk about in-place edit of resources and their property assignments. When compared to destroy and create, in-place updates of resources are much easier and a relief to deployment supervision because the overall resource behaves the same in relation to its dependencies and those that depend on it. These attributes can include a wide variety from version to rules or features but the address, binding and contract remain somewhat the same as earlier. In-place edits must have a corresponding IaC change. One of the common errors with in-place edits is that what’s feasible via the management portal, SDK or CLI may not be available in the IaC directives. For example, a restart of a resource does not necessarily have a corresponding directive. Another example as with an example of popular Azure App Service is when the access restrictions turn the public access on or off. In both cases, the access restriction remains set. Sometimes this can be overridden by simply adding a rule to allow an ip address which in turn automatically adds a deny all to other source ip addresses. The behavior can have equivalence but does not come with independent directives in the IaC syntax and semantics.

Another example of when in-place edits is not always predefined is when the value is generated. For example, a public ip address resource when deployed by the IaC might yield one ip address when run the first time and another when run subsequently. If the address determines the connectivity to the resource, the client trying to access the resource by the ip address must be changed twice. In such a case, it might be prudent to separate the updates to the resource that goes together with the generation of an ip address and create an ip address for re-purposable assignment beforehand. In those cases, the clients don’t need to change.

A tthird example of in-place edits that are counter-intuitive is when we want to make the changes that can be masked as an in-place edit so that other resources do not need to know. For example, a group of resources can be edited in place without impact to others. All it would take is to treat a resource and its dependencies as a unit so that even if one requires the other to be informed of the change, the overall update appears in-place to the rest of the deployment.

Another example of an in-place edit is a DNS zone record update to a different ip address. In reality the ‘@’ record is the top-level record of its type for that subdomain represented in the DNS zone and changing its value is essentially assigning the same name to a different resource. In these cases, the client that used to communicate by ip address might no longer be getting the same resource and even iif it is using the same name might not be able to tell the difference when one resource is used instead of the other.

Some of the pitfalls of in-place edits is when the visibility is lost or the changes remain hidden to the deployment and the idempotency is lost because the operations are no longer stateless. Such is the case with an application gateway when the updates to a backend pool member might require the removal and repeat addition of the pool member so that the application gateway can refresh its awareness of that member. In these cases, one might rely on the start/stop behavior rather than actually making these part of the IaC.

These are some of the extended examples of the in-place edits. Thank you for staying tuned.

Tuesday, August 8, 2023

These are some more additions to the common errors faced during the authoring and deployment of Infrastructure-as-Code aka IaC artifacts along with their resolutions:

First, resources might pass the identifier of one to another by virtue of one being created before the other and in some cases, these identifiers might not exist during compile time. For example, the code that requires to assign an rbac based on the managed identity of another resource might not have it during compile time and only find it when it is created during execution time. The rbac IaC will require a principal _id for which the managed identity of the resource created is required. This might require two passes of the execution – one to generate the rbac principal id and another to generate the role assignment with that principal id.

A second type of case occurs when the application requires ip address to be assigned for explaining the elaborate firewall rules required based on ip address value rather than references and the ip address is provisioned in the portal before the IaC is applied. This IaC then requires to import the existing pre-created ip address into the state so that the IaC and the state match.

Third, there may be objects in the Key Vault that were created as part of the prerequisites for the IaC deployment and now their ids need to be reconciled with the IaC. Again, the import of that resource into the state would help the IaC provider to reconcile the actual with the expected resource.

Fourth, the friendly names are often references to actual resources that may have long been dereferenced, orphaned, changed, expired, or even deleted. The friendly names, also called keys, are just references and hold value to the author in a particular context but the same author might not guarantee that the moniker is in fact consistently used unless there are some validations and review involved.

Fifth, there are always three stages between design and deploy of Infrastructure-as-code which are “init”, “plan” and “apply” and they are distinct. Success in one stage does not guarantee success in the other stage especially holding true between plan and apply stages. Another limitation is that the plan can be easily validated on the development machine but the apply stage can be performed only as part of pipeline jobs in commercial deployments. The workaround is to scope it down or target a different environment for applying.

Sixth, the ordering and sequence can only be partially manifested with corresponding attributes to explain dependencies between resources. Even if resources are self-descriptive, combination of resources must be carefully put-together by the system for a deterministic outcome.

These are only some of the articulations for the carefulness required for developing and deploying IaC.

Reference to previous articles on Infrastructure-as-Code included.

Sunday, August 6, 2023

Azure Database Migration Services – Errors and Resolutions:

While the previous articles introduced the Azure Database Migration Service for the purpose of migrating databases between MySQL servers, this article discusses a specific scenario and the errors and resolutions encountered.

The source server is an Azure MySQL Single Server instance. It is set up to allow access to all Azure Services. There are no virtual network rule customizations and the default access over the internet works.

The destination server is an Azure MySQL flexible server. It has been set up to allow access privately. There is a virtual network and subnet to which it is connected and registered in the private DNS for lookup by its name with the privatelink.mysql.database.azure.com. It does not connect directly over the internet.

The size of the database is small so the duration of the migration can be assumed to be in the order of a few minutes. There is only one database with a few tables.

When the database copy activity is attempted to be created in a new migration project on the console of the Database Migration Service, it takes the source server parameters and validates the connectivity but fails to do so for the destination server. The DMS service is deployed to an independent virtual network without a NAT gateway.

If the flexible server had been set up to allow public access and with the allowed access to the DMS service, the migration would have proceeded smoothly. Since the connectivity is private, the DMS service must make some adjustments and there will be errors encountered.

The first error is encountered by virtue of peering at the virtual networks between the DMS service and the destination server – a step necessary to allow traffic both ways over the private networks. Since the default subnet of both resources might begin with 10.0.0.0/16 CIDR, the peering cannot be made. It is necessary to create the DMS Service in a subnet with the address space different from the one that the destination MySQL server uses. Typically, this is an afterthought and one that escapes attention at the time of creation of the DMS service instance or the MySQL server.

The second error is encountered once the address space has been created distinct for both the subnets of the service and the MySQL flexible server. This error comes from the fact that the creation of a non-default address space of a virtual network/subnet for the service prevents the service from communicating to the source server with an error that says something like “UnauthorizedAccessException - 40103: Invalid authorization token signature, Resource:sb://<somename>.servicebus.windows.net:<sometopic>. Althought this error is cryptic and complains about the way a signature was created for an api call, it goes away unexplicably when the service is switched to the default address space.

This leads to a catch-22 situation of not having simultaneous access to both the source and the destination. The resolution in this case is to split the migration from source to an interim mysql server with one service that uses the default address space and then using another service to use the interim as the source and preferably with the interim in the same address space as the subnet that the second services uses. The other option is to keep it public for the duration of migration and then take it private. Or the DMS service could be instantiated in the same virtual network as the destination private MySQL Flexible Server albeit in a separate subnet with a NAT gateway to connect to the source server.

Saturday, August 5, 2023

Comparison of MySQL database migration techniques:

MySQL databases are popular on-premises as well as in the public clouds with various development teams. Routinely, they find themselves in situations where they need to migrate their databases across servers.

There are two techniques for doing so for the MySQL databases hosted in the public cloud. The first involves the native support from MySQL server instances in the form of mysqldump utility and the second involves the public cloud capabilities to migrate the data.

Between these options, the choices are usually based on habits rather than leveraging their strengths and avoiding their weaknesses. This article provides a place for both as shown.

The azure data migration service instance comes with the following.

Pros:

One instance works across all subscriptions.

Can transfer between on-premises and cloud and cloud to cloud.

Pay-per-use billing.

Provides a wizard to create data transfer activity.

Cons:

Limited features via IaC as compared to the portal but enough to get by.

Not recommended for developer instances or tiny databases that can be exported and imported via mysqldump.

binlog_expire_logs_seconds must be set to non-zero value on source server.
Supports only sql login

The steps to perform the data transfer activity between the source and destination MySQL servers involves:

Create a source mysql instance mysql-src-1 in rg-mysql-1

Create a database and add a table to mysql-src-1

Create a destination mysql instance mysql-dest-1 in rg-mysql-1

Deploy the DMS service instance and project.

Create and run an activity to transfer data.

Verify the data.

The mysqldump utilities, on the other hand, prepare the SQL for replay against the destination server. All the tables of the source database can be exported.

Pros:

1. The SQL statements fully describe the source database.

2. They can be edited before replaying

3. There are many options natively supported by the database server.

Cons:

1. It acquires a global read lock on all the tables at the beginning of the dump.

2. If long updating statements are running when the Flush statement is issued, the MySQL server may get stalled until those statements finish

When using mysqldump, it might be better to leverage the MySQL Shell Dump Utilities which provide parallel dumping with multiple threads, file compression and progress information display.

It is best to determine the size of the database before choosing the options:

SELECT table_schema "DB Name",

ROUND(SUM(data_length + index_length) / 1024 / 1024, 1) "DB Size in MB"

FROM information_schema.tables

GROUP BY table_schema;