Cluster computing

Wednesday, February 7, 2024

This is a summary of the book titled “The Global Risks Report 2023” written by Sophie Heading and Saadia Zahidi who lead the work in this regard at the World Economic Forum. The book picks up the legacy of the receding COVID-19 epidemic, the rise in carbon emissions and the war in Ukraine and leads the discussion on the topic with the global inflation arising from these events. It includes a warning that not all economies and societies will bounce back from these compounding crises which the book calls ‘polycrises’. Global risks include potential economic, population, and natural resource impacts. The most urgent crisis is rising living costs, climate change failure, irreversible ecosystem damage, societal polarization, COVID-19, financial warfare, debt crises, and potential polycrises. These risks pose significant threats to the world's health systems and economy.

The Global Risks Perception Survey (GRPS) surveyed respondents on five issues regarding short-term or long-term risks. The Executive Opinion Survey (EOS) included the views of over 12,000 business leaders. The cost of living is ranked as the world's most urgent crisis between 2023 and 2025, with the majority of respondents identifying it as a short-term risk. The report also highlights the failure to address climate change, which has been ranked high on the Global Risks Report for years. Carbon dioxide emissions and other climate change contributors are at record highs, and there is a scant chance of reaching the international goal of limiting temperature rise to 1.5°C (2.7°F) by 2030.

Climate change is a global risk that is seen as the least prepared for, with 70% of GRPS respondents rating existing measures as 'ineffective' or 'highly ineffective'. Governments are under pressure to move towards clean, renewable energy, but some nations have spent billions on fossil fuel facilities. The slow transition away from fossil fuels will have consequences, including undermining human health and ecosystems, affecting agriculture, human migration, terrorism, and conflict in Africa, Asia, and the Middle East. Short-term risks like political polarization may unfold quickly, but they still raise implications for the future. Between 2023 and 2033, pivotal risks related to climate change and the natural order will manifest, with failure to adapt being the highest-ranked global risk. Preserving ecosystems and biodiversity requires mitigating climate change, reducing fossil fuel emissions, and introducing carbon renewal technologies.

The growing gap in values and equality is posing a threat to both autocratic and democratic systems, as economic and social divides translate into political ones. Political polarization, particularly in economically dominant countries, can lead to political paralysis and undermine collective problem-solving. The COVID-19 pandemic has highlighted the risks of global public health systems, including antimicrobial resistance, infectious diseases, climate change-related chronic conditions, mental health crises, and anxiety over vaccinations. The receding pandemic has also led to increased work absences and healthcare system use, which are vulnerable to climate change impacts. With a growing and aging population, the demands on healthcare systems worldwide will become heavier, and a new pandemic or catastrophic weather event could push them to collapse. Public health agencies can improve healthcare systems by sharing information and resources, while governments can help people prevent disease by promoting healthy lifestyles, proper nutrition, and social connections.

The demilitarization of weapons and the increasing mistrust among countries and regions have led to a re-prioritization of states' military forces, potentially fracturing international relations and opening an arms race. The development of sophisticated, high-tech weapons could open new avenues of "asymmetric warfare" in the hands of rogue states or actors. A complex process of achieving useful arms control agreements will be required.

An increasing debt crisis threatens economies with instability, as countries may be unable to sustain high levels of debt under normal economic conditions. The risk of default is higher for lower-income countries, and wealthy countries may intervene, increasing geopolitical tensions. Debt crises also have a deleterious impact on investment, with advanced economies becoming more vulnerable to political polarization and conflict.

This book uses the term ‘Polycrises’ to refer to a group of related global risks that compound and develop effects greater than their individual parts and these may be common in the future. Two crucial features of such a polycrisis are the extent to which nations can properly match their resources' supply and demand, and the severity of climate change's impact on specific resources.

Previous book summaries: BookSummary47.docx

Summarizing Software: SummarizerCodeSnippets.docx. 

#codingexercise https://1drv.ms/w/s!Ashlm-Nw-wnWhOhrXTvbOPnD2OOyCg?e=P76lF3

Tuesday, February 6, 2024

This article describes how to rename a storage account in the Azure public cloud. Renames although not common arise occasionally as erstwhile resources are repurposed or have mistakes to be corrected. For example, use of prefixes and suffixes in storage account might change with repurposing or retargeting a different environment. One cannot rename an Azure storage account from the portal or any other way. If a different name were to be used for an existing storage account, a new one would need to be created with the desired name and then the data moved and any associated configurations made to the new account. Connection strings or settings of any dependent resources may need to be updated as well.

Some additional information that might be helpful are:

Data transfer can be facilitated with a cloud service like Azure Data Factory or a downloadable tool like AzCopy. Azure Data Factory is a cloud-based data integration service provided by Microsoft Azure. It allows users to create, schedule, and orchestrate data pipelines that move and transform data from various sources to different destinations. Azure Data Factory supports data movement between on-premises systems, cloud-based systems, and hybrid environments. It also provides capabilities for data transformation, data orchestration, monitoring, and management of data pipelines. With Azure Data Factory, users can easily automate and manage their data integration and data transformation processes in a scalable and reliable manner.

Azure Data Factory can be used to copy data between various data stores and services, such as:

Copying data between Azure storage accounts: Azure Data Factory can transfer data between different Azure storage accounts, such as Azure Blob storage, Azure Data Lake Storage, and Azure File Storage.
Copying data between on-premises and Azure: Azure Data Factory supports copying data between on-premises data sources, such as SQL Server or Oracle databases, and Azure data stores. This allows organizations to move data from their on-premises infrastructure to the cloud.
Copying data between cloud-based data sources: Azure Data Factory can transfer data between various cloud-based data sources, including Azure SQL Database, Azure Synapse Analytics (formerly SQL Data Warehouse), and other Azure services.
Copying data between cross-cloud platforms: Azure Data Factory enables data transfer between different cloud platforms, such as Azure and AWS or Azure and Google Cloud Platform. This allows organizations to integrate and consolidate data from multiple cloud providers.
Copying data with transformations: Azure Data Factory supports data transformations during the copying process. It allows you to apply transformations, such as filtering, aggregating, or joining data, before transferring it to the destination.
Incremental data copying: Azure Data Factory can perform incremental data copying, where only the changed or new data is transferred, rather than copying the entire dataset. This helps optimize the data transfer process and reduce costs.
Fault-tolerant retryable data transfer: Azure Data Factory can overcome rate limits encountered with the source and destination by repeatedly performing the requests and provides ways for the end-user to determine various actions on well-known errors encountered during data transfer pertaining to the type of data at the source.

Azure Data Factory provides a flexible and scalable platform for copying data between different data sources, both within Azure and across other cloud platforms. When the transfer of data is entirely between Azure cloud resources, the Azure integration runtime can be used and the data transfer is extremely fast because it goes over the Microsoft backbone network. The latency is very low for this network.

Monday, February 5, 2024

Previous book summaries: BookSummary47.docx

Summarizing Software: SummarizerCodeSnippets.docx. 

Sunday, February 4, 2024

Problem statement: Get the maximum length of a concatenated string with unique characters

Problem description:

You are given an array of strings arr. A string s is formed by the concatenation of a subsequence of arr that has unique characters.

Return the maximum possible length of s.

A subsequence is an array that can be derived from another array by deleting some or no elements without changing the order of the remaining elements.

Example 1:

Input: arr = ["un","iq","ue"]

Output: 4

Explanation: All the valid concatenations are:

- ""

- "un"

- "iq"

- "ue"

- "uniq" ("un" + "iq")

- "ique" ("iq" + "ue")

Maximum length is 4.

Example 2:

Input: arr = ["cha","r","act","ers"]

Output: 6

Explanation: Possible longest valid concatenations are "chaers" ("cha" + "ers") and "acters" ("act" + "ers").

Example 3:

Input: arr = ["abcdefghijklmnopqrstuvwxyz"]

Output: 26

Explanation: The only string in arr has all 26 characters.

Constraints:

1 <= arr.length <= 16

1 <= arr[i].length <= 26

arr[i] contains only lowercase English letters.

Solution:

import java.util.*;

import java.lang.*;

import java.util.stream.Collectors;

class Solution {

public int maxLength(List<String> arr) {

var combinations = new ArrayList<List<String>>();

getCombinations(arr, combinations);

var selections = combinations.stream().map(x->toString(x)).select (x -> isValid(x)).collect(Collectors.toList());

return selections.stream().mapToInt(x->x.length()).max().getAsInt();

}

public static void getCombinations(List<String> arr, List<List<String>> combinations) {

int N = arr.size();

for (int i = 0; i < (int) Math.pow(2, N); i++) {

List<String> combination = new ArrayList<String>();

for (int j = 0; j < N; j++) {

if ((i & (1 << j)) > 0) {

combination.add(arr.get(j));

}

List<String> copy = new ArrayList<String>(combination);

combinations.add(copy);

}

public static String toString(List<String> words) {

StringBuilder sb = new StringBuilder();

foreach (var word : words){

sb.append(word);

}

return sb.toString();

}

public static boolean isValid(String word){

Map<Character, Integer> charMap = new HashMap<>();

for (int i = 0; i< word.length(); i++){

if (charMap.containsKey(Character.valueOf(word.charAt(j)))) {

return false;

}

charMap.put(Character.valueOf(word.charAt(i)), 1);

}

return true;

}

Saturday, February 3, 2024

This article describes how to evaluate models using Azure Machine Learning Studio. We evaluate foundation models using our own test data. Microsoft developed foundation models are a great way to get started with data analysis on your data. The Model Catalog is the hub for both foundation models as well as OpenAI models. It can be used to discover, evaluate, fine tune, deploy and import models. Your own test data can be used to evaluate these models. The model card on any of the foundational model can be used to pass in the test data, map the columns for the input data, based on the schema needed for the task, provide a compute to run the evaluation on, and submit the job. The results include evaluation metrics and these can help decide if you would like to fine tune the model using your own training data. Every pre-trained model from the model catalog can be fine tuned for a specific set of tasks such as text classification, token classification, and question answering. The data can be in JSONL, CSV, orTSV fomat and the steps are just like evaluation except that you will pass in validation data for validation and test data to evaluate the fine-tuned model. Once the models are evaluated and fine-tuned, they can be deployed to endpoints for inferencing. There must be enough quota available for deployment.

OpenAI models differ from the foundation models in that they require a connection with Azure OpenAI. The process of evaluating, fine-tuning, and deploying remains the same. An Azure Machine Learning Pipeline can be used to complete a machine learning tasks which usually consists of three steps: prepare data, train a model and score the model. The pipeline optimizes the workflow with speed, portability, and reuse so you can focus on machine-learning instead of infrastructure and automation. A pipeline comprises of components for each of the three tasks and is build using the Python SDK v2, CLI or UI. All the necessary libraries such as azure.identity, azure.ai.ml, and azure.ai.ml.dsl can be imported. A component is a self-contained piece of code that does one step in a machine learning pipeline. For each component, we need to prepare the following: prepare the python script containing the execution logic, define the interface of the component, and add other metadata of the component. The interface is defined with the @command_component decorative to Python functions. The studio UI displays the pipeline as a graph and the components as blocks. The input_data, training_data, and test_data are the ports of the component which connect to other components for data streaming. Training and scoring are defined with their respective Python functions. The components can also be imported into the code. Once all the components and input data re loaded, they can be composed into a pipeline.

The Azure Machine Learning Studio allows us to view the pipeline graph, check its output and debug it. The logs and outputs of each component are available to study them. Optionally components can be registered to the workspace so they can be shared and reused.

A pipeline component can also be deployed as a batch endpoint. This is helpful to run machine learning pipeline from other platforms such as custom Java code, Azure DevOps, GitHub Actions, and Azure Data Factory. A Batch endpoint serves REST API so it can be invoked from other platforms. By isolating the pipeline component as a batch endpoint, we can change the logic of the pipeline without affecting downstream consumers. A pipeline must first be converted to a pipeline component before being deployed as a batch endpoint. Time-based schedules can be used to take care of routine jobs. A schedule associates a job with a trigger which can be a cron.

Thursday, February 1, 2024

Question: How is data access performed from non-interactive clusters in Azure Machine Learning Workspace.

Answer: There are several ways to do data access and most of them are like the interactive compute usages from a Python notebook. Connection objects in the form of AzureML Datastores and objects representing credentials are used with various clients such as Blob Services Client and AzureML client to access the data. The difference is primarily that the interactive mode does not require a credential and uses the logged-in identity, and the job can be executed with different credentials.

So, the following code snippets demonstrate how to do that:

1. With account key

# Import the azure-storage package

import azure.storage

# Create a BlobService object with the account name and key

account_name = "your_storage_account_name"

account_key = "your_storage_account_key"

blob_service = azure.storage.blob.BlobService(account_name, account_key)

# Read a blob from the storage account as a string

container_name = "your_container_name"

blob_name = "your_blob_name"

blob_data = blob_service.get_blob_to_text(container_name, blob_name).content

# Convert the blob data to a pandas DataFrame

import pandas as pd

df = pd.read_csv(blob_data)

2. Without account key:

# Import the azure-identity and azure-storage-blob packages

import azure.identity

import azure.storage.blob

# Create a DefaultAzureCredential object that uses the VM's identity

credential = azure.identity.DefaultAzureCredential()

# Create a BlobServiceClient object with the storage account URL and credential

account_url = "https://your_storage_account_name.blob.core.windows.net"

blob_service_client = azure.storage.blob.BlobServiceClient(account_url, credential)

# Read a blob from the storage account as a stream

container_name = "your_container_name"

blob_name = "your_blob_name"

blob_client = blob_service_client.get_blob_client(container_name, blob_name)

blob_stream = blob_client.download_blob()

# Convert the blob stream to a pandas DataFrame

import pandas as pd

df = pd.read_csv(blob_stream)

Also, it is important to make sure that the azure ML Workspace can create UI/submissions folder for the jobs in the associated storage account. Without the code uploaded and the job details persisted in the storage account, it cannot be run.

Previous writings: IaCResolutionsPart70.docx