Cluster computing

Tuesday, August 13, 2024

Problem Statement: A 0-indexed integer array nums is given.

Swaps of adjacent elements are able to be performed on nums.

A valid array meets the following conditions:

• The largest element (any of the largest elements if there are multiple) is at the rightmost position in the array.

• The smallest element (any of the smallest elements if there are multiple) is at the leftmost position in the array.

Return the minimum swaps required to make nums a valid array.

Example 1:

Input: nums = [3,4,5,5,3,1]

Output: 6

Explanation: Perform the following swaps:

- Swap 1: Swap the 3rd and 4th elements, nums is then [3,4,5,3,5,1].

- Swap 2: Swap the 4th and 5th elements, nums is then [3,4,5,3,1,5].

- Swap 3: Swap the 3rd and 4th elements, nums is then [3,4,5,1,3,5].

- Swap 4: Swap the 2nd and 3rd elements, nums is then [3,4,1,5,3,5].

- Swap 5: Swap the 1st and 2nd elements, nums is then [3,1,4,5,3,5].

- Swap 6: Swap the 0th and 1st elements, nums is then [1,3,4,5,3,5].

It can be shown that 6 swaps is the minimum swaps required to make a valid array.

Example 2:

Input: nums = [9]

Output: 0

Explanation: The array is already valid, so we return 0.

Constraints:

• 1 <= nums.length <= 105

• 1 <= nums[i] <= 105

Solution:

class Solution {

public int minimumSwaps(int[] nums) {

int min = Arrays.stream(nums).min().getAsInt();

int max = Arrays.stream(nums).max().getAsInt();

int count = 0;

while (nums[0] != min && nums[nums.length-1] != max && count < 2 * nums.length) {

var numsList = Arrays.stream(nums).boxed().collect(Collectors.toList());

var end = numsList.lastIndexOf(max);

for (int i = end; i < nums.length-1; i++) {

swap(nums, i, i+1);

count++;

}

numsList = Arrays.stream(nums).boxed().collect(Collectors.toList());

var start = numsList.indexOf(min);

for (int j = start; j >= 1; j--) {

swap(nums, j, j-1);

count++;

}

return count;

}

public void swap (int[] nums, int i, int j) {

int temp = nums[j];

nums[j] = nums[i];

nums[i] = temp;

}

Input

nums =

[3,4,5,5,3,1]

Output

Expected

Input

nums =

[9]

Output

Expected

Monday, August 12, 2024

Understanding Workloads for business continuity and disaster recovery (aka BCDR).

The Azure public cloud provides native capabilities in the cloud for the purposes of business continuity and disaster recovery, some of which are built into the features of the resource types used for the workload. Aside from features within the resource type to reduce RTO/RPO (for a discussion on terms used throughout the BCDR literature) please use the references), there are dedicated resources such as Azure Backup, Azure Site Recovery and various data migration services such as Azure Data Factory and Azure Database Migration Services that provided a wizard for configuring the BCDR policies which are usually specified in a file-and-forget way. Finally, there are customizations possible outside of those available from the features of the resource types and BCDR resources which can be maintained by Azure DevOps.

Organizations may find that they can be more efficient and cost-effective by taking a coarser approach at a deployment stamp level higher than the native cloud resource level and one that is tailored to their workload. This article explores some of those scenarios and the BCDR solutions that best serve them.

Scenario 1: Microservices framework: This form of deployment is preferred when the workload wants to update various services hosted as api/ui independently from others for their lifetime. Usually, there are many web applications, and a resource is dedicated to each of them in the form of an app service or a container framework. The code is either deployed via a pipeline directly as source code or published to an image that the resource pulls. One of the most important aspects peculiar to this workload is the dependencies between various applications. When a disaster strikes the entire deployment, they won’t all work together even when restored individually in a different region without reestablishing these links. Take for example, the private endpoints that provide connectivity privately between caller-callee pairs of these services. Sometimes the callee is external to the network and even subscription and usually endpoint establishing the connectivity is manually registered. There is no single button or pipeline that can recreate the deployment stamp and certainly none that can replace the manual approval required to commission the private link. Since individual app services maintain their distinctive dependencies and fulfilment of functionality but cannot work without the whole set of app services, it is important to make them exportable and importable via Infrastructure-as-code aka IaC that takes into account parameters such as subscription, resource groups, virtual network, prefixes and suffixes in naming convention and recreates a stamp.

The second characteristic of this workload is that typically it will involve a diverse set of dependencies and stacks to host the various web applications that it does. There won’t be any consistency, so the dependencies could range from a mysql database server to producing and consuming jobs on a databricks analytical workspace or an airflow automation. Consequently, the dependencies must be part of the BCDR story. Since this usually involves data and scripts, they should be migrated to the new instance. Migration and renaming are two pervasive activities for the BCDR of this workload type. Scripts that are registered in a source code repository like GitHub must be pulled and spun into an on-demand triggered or scheduled workflow.

Lastly, data used by these resources are usually proprietary and territorial in terms of ownership. This implies that the backup and restore of data might have to exist independently and as per the consensus with the owner and the users. A MySQL data can be transferred to and from another instance via the Azure Database Migration Service so as to avoid the use of mysqldump command line with credentials or via GitOps via the az command to the database server instance with implicit login An approach that suits the owner and users can be implemented outside the IaC.

Reference:

1. Business Continuity and Disaster Recovery.docx

2. BCDRBestPractices.docx

Sunday, August 11, 2024

Find minimum in a rotated sorted array:

class Solution {

public int findMin(int[] A) {

If (A == null || A.length == 0) { return Integer.MIN_VALUE; }

int start = 0;

int end = A.length -1;

while (start < end) {

int mid = (start + end) / 2;

// check monotonically increasing series

if (A[start] <= A[end] && A[start] <= A[mid] && A[mid] <= A[end]]) { return A[start];};

// check if only [start, end]

if (mid == start || mid == end) { if (A[start] < A[end]) return A[start]; else return A[end];}

// detect rotation point

if (A[start] > A[mid]){

end = mid;

} else {

if (A[mid] > A[mid+1]) return A[mid+1];

start = mid + 1;

}

return A[0];

}

Works for:

[0 1 4 4 5 6 7]

[7 0 1 4 4 5 6]

[6 7 0 1 4 4 5]

[5 6 7 0 1 4 4]

[4 5 6 7 0 1 4]

[4 4 5 6 7 0 1]

[1 4 4 5 6 7 0]

[1 0 0 0 0 0 1]

Saturday, August 10, 2024

A self organizing map algorithm for scheduling meeting times as availabilities and bookings. A map is a low-dimensional representation of a training sample comprising of elements e. It is represented by nodes n. The map is transformed by a regression operation to modify the nodes position one element from the model (e) at a time. With preferences translating to nodes and availabilities as elements, this allows the map to start getting a closer match to the sample space with each epoch/iteration.

from sys import argv

import numpy as np

from io_helper import read_xyz, normalize

from neuron import generate_network, get_neighborhood, get_boundary

from distance import select_closest, euclidean_distance, boundary_distance

from plot import plot_network, plot_boundary

def main():

if len(argv) != 2:

print("Correct use: python src/main.py <filename>.xyz")

return -1

problem = read_xyz(argv[1])

boundary = som(problem, 100000)

problem = problem.reindex(boundary)

distance = boundary_distance(problem)

print('Boundary found of length {}'.format(distance))

def som(problem, iterations, learning_rate=0.8):

"""Solve the xyz using a Self-Organizing Map."""

# Obtain the normalized set of timeslots (w/ coord in [0,1])

timeslots = problem.copy()

# print(timeslots)

#timeslots[['X', 'Y', 'Z']] = normalize(timeslots[['X', 'Y', 'Z']])

# The population size is 8 times the number of timeslots

n = timeslots.shape[0] * 8

# Generate an adequate network of neurons:

network = generate_network(n)

print('Network of {} neurons created. Starting the iterations:'.format(n))

for i in range(iterations):

if not i % 100:

print('\t> Iteration {}/{}'.format(i, iterations), end="\r")

# Choose a random timeslot

timeslot = timeslots.sample(1)[['X', 'Y', 'Z']].values

winner_idx = select_closest(network, timeslot)

# Generate a filter that applies changes to the winner's gaussian

gaussian = get_neighborhood(winner_idx, n//10, network.shape[0])

# Update the network's weights (closer to the timeslot)

network += gaussian[:,np.newaxis] * learning_rate * (timeslot - network)

# Decay the variables

learning_rate = learning_rate * 0.99997

n = n * 0.9997

# Check for plotting interval

if not i % 1000:

plot_network(timeslots, network, name='diagrams/{:05d}.png'.format(i))

# Check if any parameter has completely decayed.

if n < 1:

print('Radius has completely decayed, finishing execution',

'at {} iterations'.format(i))

break

if learning_rate < 0.001:

print('Learning rate has completely decayed, finishing execution',

'at {} iterations'.format(i))

break

else:

print('Completed {} iterations.'.format(iterations))

# plot_network(timeslots, network, name='diagrams/final.png')

boundary = get_boundary(timeslots, network)

plot_boundary(timeslots, boundary, 'diagrams/boundary.png')

return boundary

if __name__ == '__main__':

main()

Reference:

https://github.com/raja0034/som4drones

#codingexercise

https://1drv.ms/w/s!Ashlm-Nw-wnWhPBaE87l8j0YBv5OFQ?e=uCIAp9

Thursday, August 8, 2024

This is the Knuth-Morris-Pratt method of string matching

Public void KMP-Matcher(String text, String pattern) {

Int n = text.length();

Int m = pattern.length();

Int[] prefixes = ComputePrefixFunction(pattern);

Int noOfCharMatched = 0;

for ( int I = 1; I <= n; I++) {

While (noOfCharMatched > 0 && pattern[noOfCharMatched + 1] != Text[I])

NoOfCharMatched = prefixes[nofOfCharMatched]

If (pattern[noOfCharMatched + 1] == text[I])

NoOfCharMatched = NoOfCharMatched + 1;

If (noOfCharMatched == m) {

System.out.println(“Pattern occurs at “ + I);

NoOfCharMatched = prefixes[NoOfCharMatched];

}

Public int[] ComputePrefixFunction(String pattern) {

Int m = pattern.length();

Int[] prefixes = new int[m+1];

Prefixes[1] = 0;

Int k = 0;

For (int q = 2; q <=m ; q++) {

While (k > 0 && Pattern[k + 1] != Pattern[q])

K = pattern[k];

If (pattern[k+1] == Pattern[q]) {

K = k + 1;

}

Pattern[q] = k;

}

Return prefixes;

}

Reference: for drone data: https://1drv.ms/w/s!Ashlm-Nw-wnWhPFoQ0k-mnjii2Gs3Q?e=cbET9N

Tuesday, August 6, 2024

-- Demonstrate dynamic tagging for drone data vectors

USE master;

IF NOT EXISTS (SELECT 1 FROM sys.server_principals WHERE name = N'DroneFleetUser')

BEGIN

CREATE LOGIN DroneFleetUser

WITH PASSWORD = N'LuvDr0ne!',

CHECK_POLICY = OFF,

CHECK_EXPIRATION = OFF,

DEFAULT_DATABASE = DroneCatalog;

END;

IF NOT EXISTS (SELECT 1 FROM sys.server_principals WHERE name = N'DroneFleetAdmin')

BEGIN

CREATE LOGIN DroneFleetAdmin

WITH PASSWORD = N'LuvDr0neFl@@t!',

CHECK_POLICY = OFF,

CHECK_EXPIRATION = OFF,

DEFAULT_DATABASE = DroneCatalog;

END;

USE DroneCatalog;

CREATE USER DroneFleetUser FOR LOGIN DroneFleetUser;

CREATE USER DroneFleetAdmin FOR LOGIN DroneFleetAdmin;

ALTER ROLE [Drone Operators] ADD MEMBER DroneFleetUser;

-- Ensure that the policy has been applied

EXEC [Application].Configuration_ApplyDynamicTagging;

-- The function that has been applied is as follows:

-- CREATE FUNCTION [Application].DetermineDroneUserAccess(@TeamID int)

-- RETURNS TABLE

-- WITH SCHEMABINDING

-- AS

-- RETURN (SELECT 1 AS AccessResult

-- WHERE IS_ROLEMEMBER(N'db_owner') <> 0

-- OR IS_ROLEMEMBER((SELECT sp.FlightsTerritory

-- FROM [Application].Teams AS c

-- INNER JOIN [Application].Fleets AS sp

-- ON c.FleetID = sp.FleetID

-- WHERE c.TeamID = @TeamID) + N' Flights') <> 0

-- OR (ORIGINAL_LOGIN() = N'DroneFleetAdmin'

-- AND EXISTS (SELECT 1

-- FROM [Application].Teams AS c

-- INNER JOIN [Application].Fleets AS sp

-- ON c.FleetID = sp.FleetID

-- WHERE c.TeamID = @TeamID

-- AND sp.FlightsTerritory = SESSION_CONTEXT(N'FlightsTerritory'))));

-- GO

-- The security policy that has been applied is as follows:

-- CREATE SECURITY POLICY [Application].FilterDroneUsersByFlightsTerritoryRole

-- ADD FILTER PREDICATE [Application].DetermineDroneUserAccess(DeliveryTeamID)

-- ON Flights.DroneUsers,

-- ADD BLOCK PREDICATE [Application].DetermineDroneUserAccess(DeliveryTeamID)

-- ON Flights.DroneUsers AFTER UPDATE;

-- GO

SELECT * FROM sys.database_principals; -- not the role for Pacific and the user for Pacific

SELECT * FROM Flights.DroneUsers; -- and note count

GRANT SELECT, UPDATE ON Flights.DroneUsers TO [Drone Operators];

GRANT SELECT ON [Application].Teams TO [Drone Operators];

GRANT SELECT ON [Application].Fleets TO [Drone Operators];

GRANT SELECT ON [Application].Inventories TO [Drone Operators];

-- impersonate the user DroneFleetUser

EXECUTE AS USER = 'DroneFleetUser';

-- Now note the count and which rows are returned

-- even though we have not changed the command

SELECT * FROM Flights.DroneUsers;

-- where are those drones?

-- note the spatial results tab

SELECT c.Border

FROM [Application].Inventories AS c

WHERE c.InventoryName = N'Northwest'

UNION ALL

SELECT c.DeliveryLocation

FROM Flights.DroneUsers AS c

-----------------------------------------------------------------------

-- updating rows that are accessible to a non-accessible row is blocked

-----------------------------------------------------------------------

DECLARE @DroneFleetDroneUserID INT

DECLARE @NonDroneFleetTeamID INT

-- pick a drone in the Pacific flights territory

SELECT TOP 1 @DroneFleetDroneUserID=c.DroneUserID

FROM Flights.DroneUsers c JOIN Application.Teams ci ON c.DeliveryTeamID=ci.TeamID

JOIN Application.Fleets sp ON ci.FleetID=sp.FleetID

WHERE sp.FlightsTerritory=N'Pacific'

-- pick a Team outside of the Pacific flights territory

SELECT @NonDroneFleetTeamID=c.TeamID

FROM Application.Teams c JOIN Application.Fleets sp ON c.FleetID=sp.FleetID

WHERE TeamName=N'Seattle' AND sp.FleetCode=N'WA'

UPDATE Flights.DroneUsers -- Attempt to update

SET DeliveryTeamID = @NonDroneFleetTeamID -- to a team that is not in the Drone Operators Territory

WHERE DroneUserID = @DroneFleetDroneUserID; -- for a drone that is in the Drone Operators Territory

-- revert the impersonation

REVERT;

-- Remove the user from the role

ALTER ROLE [Drone Operators] DROP MEMBER DroneFleetUser;

-- Instead of permission for a role, let's give permissions to the website user

GRANT SELECT, UPDATE ON Flights.DroneUsers TO [DroneFleetAdmin];

GRANT SELECT ON [Application].Teams TO [DroneFleetAdmin];

GRANT SELECT ON [Application].Inventories TO [DroneFleetAdmin];

-- Finally, tidy up (optional)

REVOKE SELECT, UPDATE ON Flights.DroneUsers FROM [Drone Operators];

REVOKE SELECT ON [Application].Teams FROM [Drone Operators];

REVOKE SELECT ON [Application].Inventories FROM [Drone Operators];

REVOKE SELECT, UPDATE ON Flights.DroneUsers FROM [DroneFleetAdmin];

REVOKE SELECT ON [Application].Teams FROM [DroneFleetAdmin];

REVOKE SELECT ON [Application].Inventories FROM [DroneFleetAdmin];

DROP USER DroneFleetUser;

DROP USER DroneFleetAdmin;

USE master;

DROP LOGIN DroneFleetUser;

DROP LOGIN DroneFleetAdmin;

-- Reference: DroneData: https://1drv.ms/w/s!Ashlm-Nw-wnWhPJAFzVxJMWI2f_eKw?e=BDtnPM

#codingexercise

https://1drv.ms/w/s!Ashlm-Nw-wnWhM0bmlY_ggTBTNTYxQ?e=K8GuKL

Monday, August 5, 2024

When describing the Azure Machine Learning Workspace deployments via IaC and its shortcomings and corresponding resolutions, it was hinted that the workspace and all its infrastructure concerns can be resolved at deployment time so that the data scientists are free to focus on business use cases. Part of this setup involves kernel creation that can be done via scripts during the creation and assignment of compute to the data scientists. There are two scripts required one at the creation time and other at the start of the compute. Some commads require the terminal to be restarted, so the split in the scripts helps with the stages to specify them. For example, to provision a python 3.11 and spark 3.5 based custom kernel, the following scripts come useful

#!/bin/bash

set -e

curl https://repo.anaconda.com/archive/Anaconda3-2024.02-1-Linux-x86_64.sh --output Anaconda3-2024.02-1-Linux-x86_64.sh

chmod 755 Anaconda3-2024.02-1-Linux-x86_64.sh

./Anaconda3-2024.02-1-Linux-x86_64.sh -b

# This script creates a custom conda environment and kernel based on a sample yml file.

echo "installation complete"

cat <<EOF > env.yaml

name: python3.11_spark3.5

channels:

- conda-forge

- defaults

dependencies:

- python=3.11

- numpy

- pyspark

- pip

- pip:

- azureml-core

- ipython

- ipykernel

- pyspark==3.5

EOF

echo "env.yaml written"

/anaconda/condabin/conda env create -f env.yaml

echo "Initializing new conda environment"

/anaconda/condabin/conda init bash

#!/bin/bash

set -e

echo "Activating new conda environment"

/anaconda/envs/azureml_py38/bin/conda init --all

/anaconda/envs/azureml_py38/bin/conda init bash

export PATH="/anaconda/condabin:$PATH"

export name="python3.11_spark3.5"

conda install -p "/anaconda/envs/$name" -y ipykernel anaconda::pyspark anaconda::conda

conda -v activate "$name" && true

echo "Installing kernel"

sudo -u azureuser -i <<'EOF'

export name="python3.11_spark3.5"

export pathToPython3="/anaconda/envs/$name/bin/python3"

$pathToPython3 -m pip install pip --upgrade

$pathToPython3 -m pip install pyopenssl --upgrade

$pathToPython3 -m pip install pyspark==3.5

$pathToPython3 -m pip install snowflake-snowpark-python==1.20.0

$pathToPython3 -m pip install snowflake-connector-python==3.11.0

$pathToPython3 -m pip install azure-keyvault

$pathToPython3 -m pip install azure-identity

$pathToPython3 -m pip install ipykernel==v6.29.5

$pathToPython3 -m ipykernel install --user --name "$name" --display-name "Python 3.11 - Spark 3.5 (DSS)"

echo "Conda environment setup successfully."

EOF