Cluster computing

Sunday, February 4, 2024

Problem statement: Get the maximum length of a concatenated string with unique characters

Problem description:

You are given an array of strings arr. A string s is formed by the concatenation of a subsequence of arr that has unique characters.

Return the maximum possible length of s.

A subsequence is an array that can be derived from another array by deleting some or no elements without changing the order of the remaining elements.

Example 1:

Input: arr = ["un","iq","ue"]

Output: 4

Explanation: All the valid concatenations are:

- ""

- "un"

- "iq"

- "ue"

- "uniq" ("un" + "iq")

- "ique" ("iq" + "ue")

Maximum length is 4.

Example 2:

Input: arr = ["cha","r","act","ers"]

Output: 6

Explanation: Possible longest valid concatenations are "chaers" ("cha" + "ers") and "acters" ("act" + "ers").

Example 3:

Input: arr = ["abcdefghijklmnopqrstuvwxyz"]

Output: 26

Explanation: The only string in arr has all 26 characters.

Constraints:

1 <= arr.length <= 16

1 <= arr[i].length <= 26

arr[i] contains only lowercase English letters.

Solution:

import java.util.*;

import java.lang.*;

import java.util.stream.Collectors;

class Solution {

public int maxLength(List<String> arr) {

var combinations = new ArrayList<List<String>>();

getCombinations(arr, combinations);

var selections = combinations.stream().map(x->toString(x)).select (x -> isValid(x)).collect(Collectors.toList());

return selections.stream().mapToInt(x->x.length()).max().getAsInt();

}

public static void getCombinations(List<String> arr, List<List<String>> combinations) {

int N = arr.size();

for (int i = 0; i < (int) Math.pow(2, N); i++) {

List<String> combination = new ArrayList<String>();

for (int j = 0; j < N; j++) {

if ((i & (1 << j)) > 0) {

combination.add(arr.get(j));

}

List<String> copy = new ArrayList<String>(combination);

combinations.add(copy);

}

public static String toString(List<String> words) {

StringBuilder sb = new StringBuilder();

foreach (var word : words){

sb.append(word);

}

return sb.toString();

}

public static boolean isValid(String word){

Map<Character, Integer> charMap = new HashMap<>();

for (int i = 0; i< word.length(); i++){

if (charMap.containsKey(Character.valueOf(word.charAt(j)))) {

return false;

}

charMap.put(Character.valueOf(word.charAt(i)), 1);

}

return true;

}

Saturday, February 3, 2024

This article describes how to evaluate models using Azure Machine Learning Studio. We evaluate foundation models using our own test data. Microsoft developed foundation models are a great way to get started with data analysis on your data. The Model Catalog is the hub for both foundation models as well as OpenAI models. It can be used to discover, evaluate, fine tune, deploy and import models. Your own test data can be used to evaluate these models. The model card on any of the foundational model can be used to pass in the test data, map the columns for the input data, based on the schema needed for the task, provide a compute to run the evaluation on, and submit the job. The results include evaluation metrics and these can help decide if you would like to fine tune the model using your own training data. Every pre-trained model from the model catalog can be fine tuned for a specific set of tasks such as text classification, token classification, and question answering. The data can be in JSONL, CSV, orTSV fomat and the steps are just like evaluation except that you will pass in validation data for validation and test data to evaluate the fine-tuned model. Once the models are evaluated and fine-tuned, they can be deployed to endpoints for inferencing. There must be enough quota available for deployment.

OpenAI models differ from the foundation models in that they require a connection with Azure OpenAI. The process of evaluating, fine-tuning, and deploying remains the same. An Azure Machine Learning Pipeline can be used to complete a machine learning tasks which usually consists of three steps: prepare data, train a model and score the model. The pipeline optimizes the workflow with speed, portability, and reuse so you can focus on machine-learning instead of infrastructure and automation. A pipeline comprises of components for each of the three tasks and is build using the Python SDK v2, CLI or UI. All the necessary libraries such as azure.identity, azure.ai.ml, and azure.ai.ml.dsl can be imported. A component is a self-contained piece of code that does one step in a machine learning pipeline. For each component, we need to prepare the following: prepare the python script containing the execution logic, define the interface of the component, and add other metadata of the component. The interface is defined with the @command_component decorative to Python functions. The studio UI displays the pipeline as a graph and the components as blocks. The input_data, training_data, and test_data are the ports of the component which connect to other components for data streaming. Training and scoring are defined with their respective Python functions. The components can also be imported into the code. Once all the components and input data re loaded, they can be composed into a pipeline.

The Azure Machine Learning Studio allows us to view the pipeline graph, check its output and debug it. The logs and outputs of each component are available to study them. Optionally components can be registered to the workspace so they can be shared and reused.

A pipeline component can also be deployed as a batch endpoint. This is helpful to run machine learning pipeline from other platforms such as custom Java code, Azure DevOps, GitHub Actions, and Azure Data Factory. A Batch endpoint serves REST API so it can be invoked from other platforms. By isolating the pipeline component as a batch endpoint, we can change the logic of the pipeline without affecting downstream consumers. A pipeline must first be converted to a pipeline component before being deployed as a batch endpoint. Time-based schedules can be used to take care of routine jobs. A schedule associates a job with a trigger which can be a cron.

Thursday, February 1, 2024

Question: How is data access performed from non-interactive clusters in Azure Machine Learning Workspace.

Answer: There are several ways to do data access and most of them are like the interactive compute usages from a Python notebook. Connection objects in the form of AzureML Datastores and objects representing credentials are used with various clients such as Blob Services Client and AzureML client to access the data. The difference is primarily that the interactive mode does not require a credential and uses the logged-in identity, and the job can be executed with different credentials.

So, the following code snippets demonstrate how to do that:

1. With account key

# Import the azure-storage package

import azure.storage

# Create a BlobService object with the account name and key

account_name = "your_storage_account_name"

account_key = "your_storage_account_key"

blob_service = azure.storage.blob.BlobService(account_name, account_key)

# Read a blob from the storage account as a string

container_name = "your_container_name"

blob_name = "your_blob_name"

blob_data = blob_service.get_blob_to_text(container_name, blob_name).content

# Convert the blob data to a pandas DataFrame

import pandas as pd

df = pd.read_csv(blob_data)

2. Without account key:

# Import the azure-identity and azure-storage-blob packages

import azure.identity

import azure.storage.blob

# Create a DefaultAzureCredential object that uses the VM's identity

credential = azure.identity.DefaultAzureCredential()

# Create a BlobServiceClient object with the storage account URL and credential

account_url = "https://your_storage_account_name.blob.core.windows.net"

blob_service_client = azure.storage.blob.BlobServiceClient(account_url, credential)

# Read a blob from the storage account as a stream

container_name = "your_container_name"

blob_name = "your_blob_name"

blob_client = blob_service_client.get_blob_client(container_name, blob_name)

blob_stream = blob_client.download_blob()

# Convert the blob stream to a pandas DataFrame

import pandas as pd

df = pd.read_csv(blob_stream)

Also, it is important to make sure that the azure ML Workspace can create UI/submissions folder for the jobs in the associated storage account. Without the code uploaded and the job details persisted in the storage account, it cannot be run.

Previous writings: IaCResolutionsPart70.docx

Wednesday, January 31, 2024

This is a summary of the book titled “Chaos Kings” written by Scott Patterson, a seasoned financial reporter and author who talks about how Wall Street traders make billions in the New Age of Crisis. The book revolves around the theme of “black swan” investing, a pessimistic brand of trading, that seems to have worked very well for a few during the pandemic. Market crashes and political chaos are growing more common. The pandemic showed that an always pessimistic portfolio offers downside risk protection.

The coronavirus pandemic has highlighted the importance of "black swan" investing strategies, which involve unexpected events that create volatility. The concept was popularized by Nassim Taleb, who believed that unknown threats lurk at all times and investors should be prepared. In early 2020, the pandemic spread, creating a black swan for Universa Investments, a Miami investment firm that operates the Black Swan Protection Protocol Fund. The fund reported a gain of 4,144% in three months, generating outsized profits in case of a crash. However, the market bust quickly reversed itself, setting the stage for future black swans. The era of instant communication and porous borders has created a breeding ground for conspiracy theories and contagions, with the pandemic-era disruptions to the supply chain illustrating the fallout from seemingly far-off risks. Climate change is leading to more dangerous extreme weather, affecting property owners and insurers. Political extremism is also on the rise, with the US electorate's mood swings indicating a rise in "polycrisis," characterized by uncertainty, instability, and dangerous feedback loops.

Mark Spitznagel, a renowned trader, learned trading discipline at a commodities exchange and worked for Everett Klipp, a star trader. Klipp taught Spitznagel the counterintuitive strategy of loving to lose money and hate to make money, which meant never holding onto a position after it took a small loss. Spitznagel became a trader on the Chicago Board of Trade (CBOT) at age 22, trading his own money and focusing on Treasury bonds. He survived the bond crash of 1994 and switched to a hedging strategy, performing well during the Asian crisis of 1997 and the 1998 blowup of Long Term Capital Management. Nassim Taleb, a Wall Street trader, also faced challenges as a trader, growing skeptical of Wall Street's theories of financial engineering. In the late 1990s, Taleb met Spitznagel and decided to go into business together. Their firm, Empirica Capital, could have hugely profited after the September 11, 2001, attacks, but clients were reluctant to gain from the crisis.

Taleb, a contrarian investor, gained media attention with his book Fooled by Randomness and met French mathematician Benoit Mandelbrot. However, Empirica was losing money in a market without volatility, leading to Taleb's departure in 2004. He later wrote The Black Swan, which became a bestseller. Despite skepticism, Universa Investments became extremely profitable during the 2008 stock market crash, making $1 billion and outperforming the market by 115%. Universa's success was due to its larger bet against the overall market, which allowed it to replicate its gains in the future.

Despite the pandemic, this firm struggled to attract investors, but in 2017, it landed a $1.5 billion investment from the California Public Employees Retirement System (CalPERS). In 2018, Volmageddon hit, and CalPERS ramped up its investment to $5 billion. However, in 2019, CalPERS closed out its investment with Universa just before the coronavirus first appeared in Wuhan, China. As the pandemic took hold in China, Universa managed to recruit new clients and snap up S&P 500 puts and VIX call options on the cheap.

The pandemic exposed the importance of a pessimistic portfolio for protection against downside risks. Universa, a hedge fund, demonstrated its hedging strategy effectively, despite the coronavirus market crash. Spitznagel's wealth soared to $250 million, and CalPERS's decision to withdraw funds before the crash led to internal disputes. Meanwhile, Taleb argued that bitcoin was worthless, as it had no intrinsic worth and needed constant attention from miners. Despite the crypto crash, stock investors were still high, with the Dow Jones Industrial Average reaching 36,000. Geopolitical minefields, such as Russia's invasion of Ukraine and Putin's war, were also a concern. Both investors had misread the situation, highlighting the need for a balanced approach to risk management.

Previous book summaries: BookSummary46.docx

Summarizing Software: SummarizerCodeSnippets.docx.

Tuesday, January 30, 2024

Data plane access to secrets

One of the biggest consolidators of data access aside from storage is a keyvault. Many clients access keys, secrets, and certificates from, say, an Azure KeyVault and this calls for access control in both the management portal and the IaC. An Azure Keyvault is a service that provides secure storage and management of keys, secrets, and certificates. It offers two ways to control access to its data plane: role-based access control (RBAC) and access control policies (ACP)¹.

RBAC is an authorization system built on Azure Resource Manager that provides fine-grained access management of Azure resources. It allows you to assign predefined or custom roles to users, groups, service principals, and managed identities at different scopes, such as management group, subscription, resource group, or individual resource².

ACP is a legacy authorization system, native to Keyvault, that provides access to keys, secrets, and certificates. It allows you to assign individual permissions to security principals at Keyvault scope¹.

Azure Key Vaults should be provisioned with role-based access control (RBAC) for managing access to the key vault itself. This enables us to assign specific roles (such as owner, contributor, or reader) to users, groups, or applications. RBAC ensures fine-grained access control and aligns with the principle of least privilege by granting access based on specific roles assigned to users. On the other hand, access control policies are used within the key vault to manage access to specific key vault resources. These policies define which users, groups, or applications can perform specific operations (such as getting, setting, deleting keys or secrets). Access control policies provide a higher level of granularity for managing permissions within the key vault. Therefore, both RBAC and access control policies should be used for proper access management in Azure Key Vaults. RBAC should be used to manage access to the key vault itself, while access control policies should be used to manage access to specific resources within the key vault.

The key advantages of RBAC over ACP involve:

- A unified access control model for Azure resources

- Centralized access management and auditing

- Better control over the right to grant access to keys, secrets, and certificates

- Integration with Privileged Identity Management

- Support for deny assignments

The trouble with data plane role-based access control is that the directives is that it is often neglected when control plane rbac is assigned. Even when they are specified the assignments must usually be indirect in the form of Active directory groups so that memberships can be assigned or revoked without disturbing the assignment. On the other hand, ACPs can be specified directly on the resource and specific to a user or principal and not require different group memberships for different accesses. ACPs also hardly require to be captured in the IaC as it is assumed that applications and code access are anyways authorized in IaC by necessary RBAC. In this way, when KeyVault resources are restored, ACPs can be reset without loss of functionality and authorized users interested to gain access can add themselves again

Monday, January 29, 2024

The role of generative AI on fleet formation and optimization:

When fleet operators leverage superimposed layers on geographical maps, they can leverage AI and predictive analytics to enhance their decision-making on routes and schedules, increase efficiency, improve reliability, and enhance sustainability. The automations involve data collection, analysis, visualizations, predictions, and optimizations. Data-driven recommendations become available to these operators.

Specifically, these include:

Data collection: from a variety of sources such as sensors, cameras, GPS or external databases.

Data analysis: to extract patterns for vehicle performance, driver behavior, customer preferences, or traffic conditions.

Data visualization: such as dashboards, charts, maps, or reports

Data prediction: with models for regression and classifications

And data optimization: such as for route planning, maintenance scheduling, vehicle allocation, or formation adjustment.

The 3D-formation and flight path between source and destination across rocky terrain, on the other hand, has very little data other than contour maps, weather conditions and other environmental factors but they can still be viewed in segments and sequences and generative AI encodes and decodes state using transformers.

Specific use cases include:

Designing optimal flight paths and schedules for multiple drones, taking into account factors like weather, traffic, terrain, and regulations. Generative AI can use techniques like reinforcement learning or evolutionary algorithms to find the best solutions for complex optimization problems ¹ ².
Generating realistic simulations and scenarios for testing and training drone operators and systems. Generative AI can use techniques like generative adversarial networks (GANs) or variational autoencoders (VAEs) to create synthetic data and environments that mimic real-world conditions ³ ⁴.
Enhancing the capabilities and performance of drones, such as improving their vision, navigation, communication, and autonomy. Generative AI can use techniques like convolutional neural networks (CNNs) or transformers to process and generate visual, audio, or textual data from drones’ sensors and cameras ⁵ ⁶.

More information about generative AI and its applications is available via the following resources:

The Generative AI Dossier: A collection of high-impact use cases of generative AI across six major industries, including transportation and mobility, by Deloitte.
AI Drones: How Artificial Intelligence Works in Drones and Examples: An article that showcases 13 companies that are using AI to improve a new generation of intelligent drones, by Built In.
AI-Driven Fleet Management: Predict, Optimize, Automate: An article that explains how AI-based recommendations can help skilled fleet workers excel, by Hitachi.
The Essential Guide to Applying AI to the Drone Delivery Ecosystem: A guide that covers the key applications of AI to drone delivery, such as obstacle detection and avoidance, GPS-free navigation, contingency management and emergency landings, delivery drop, and safe landings, by CloudFactory.