Cluster computing

Monday, August 5, 2024

When describing the Azure Machine Learning Workspace deployments via IaC and its shortcomings and corresponding resolutions, it was hinted that the workspace and all its infrastructure concerns can be resolved at deployment time so that the data scientists are free to focus on business use cases. Part of this setup involves kernel creation that can be done via scripts during the creation and assignment of compute to the data scientists. There are two scripts required one at the creation time and other at the start of the compute. Some commads require the terminal to be restarted, so the split in the scripts helps with the stages to specify them. For example, to provision a python 3.11 and spark 3.5 based custom kernel, the following scripts come useful

#!/bin/bash

set -e

curl https://repo.anaconda.com/archive/Anaconda3-2024.02-1-Linux-x86_64.sh --output Anaconda3-2024.02-1-Linux-x86_64.sh

chmod 755 Anaconda3-2024.02-1-Linux-x86_64.sh

./Anaconda3-2024.02-1-Linux-x86_64.sh -b

# This script creates a custom conda environment and kernel based on a sample yml file.

echo "installation complete"

cat <<EOF > env.yaml

channels:

- conda-forge

- defaults

dependencies:

- python=3.11

- numpy

- pyspark

- pip

- pip:

- azureml-core

- ipython

- ipykernel

- pyspark==3.5

EOF

echo "env.yaml written"

/anaconda/condabin/conda env create -f env.yaml

echo "Initializing new conda environment"

/anaconda/condabin/conda init bash

#!/bin/bash

set -e

echo "Activating new conda environment"

/anaconda/envs/azureml_py38/bin/conda init --all

/anaconda/envs/azureml_py38/bin/conda init bash

export PATH="/anaconda/condabin:$PATH"

export name="python3.11_spark3.5"

conda install -p "/anaconda/envs/$name" -y ipykernel anaconda::pyspark anaconda::conda

conda -v activate "$name" && true

echo "Installing kernel"

sudo -u azureuser -i <<'EOF'

export name="python3.11_spark3.5"

export pathToPython3="/anaconda/envs/$name/bin/python3"

$pathToPython3 -m pip install pip --upgrade

$pathToPython3 -m pip install pyopenssl --upgrade

$pathToPython3 -m pip install pyspark==3.5

$pathToPython3 -m pip install snowflake-snowpark-python==1.20.0

$pathToPython3 -m pip install snowflake-connector-python==3.11.0

$pathToPython3 -m pip install azure-keyvault

$pathToPython3 -m pip install azure-identity

$pathToPython3 -m pip install ipykernel==v6.29.5

$pathToPython3 -m ipykernel install --user --name "$name" --display-name "Python 3.11 - Spark 3.5 (DSS)"

echo "Conda environment setup successfully."

EOF

Sunday, August 4, 2024

This is a summary of the book titled “Why not better and cheaper?” written by James and Robert Rebitzer for healthcare and innovation and published by Oxford University Press in June 2023. The brothers delve into a study comprising of research, social norms, and market competition on why the results fall short for patients and society. Contrasting with a poster child story for better and cheaper innovations in residential lighting to begin with, the authors take us on a journey through the landscape and history in healthcare and its state of union. The healthcare system is at once profusely innovative and yet remarkably ineffective in discovering ways to deliver increased value at lower cost.

Innovations in treating heart conditions, for example, illustrate the significance and interrelation of these two factors of value addition and cost reduction. Among a large number of nations, this disease is one of the leading contributor for deaths and research in this field increased tremendously in the recent decades rather than earlier, partly because research was traditionally based on indirect markers and indicators. LDL or bad cholesterol could be reduced by newer class of drugs that were recently discovered including evolocumab and alirocumab and these were approved for a wider audience. The initial list price for these drugs is over fourteen thousand dollars per year and this is excessive cost considering the drug has to be taken for life. This example shows that patients want to improve outcomes in mortality and quality of life, but the innovations and their delivery are determined by the pharmacies, physicians and insurance that influence their purchasing decisions. High-cost innovations can continue to gain market in healthcare and can coexist with low-cost innovations. Low value innovations can gain great market penetration while high value may not. In fact, a campaign called “Choosing Wisely” grew to promote low value options while calling out over five hundred tests that patients should avoid which usually contributes to about a hundred million dollars in waste annually. Cost reduction problem, on the other hand, arises from failure to imbibe processes, technologies, and skills to remove inefficiencies and reduce resources used. Innovators are unable to focus on cost reduction because skilled intervention becomes necessary which has limited the potential for pilot projects to reach mainstream. Taken together, this causes innovations in healthcare to underperform. Financial incentives, norms and competition can articulate these two symptoms.

Patents often fail to provide economic gains for innovators. Patents stimulate innovation by providing time-allotted monopolies, but the innovators can only profit when there is demand for the product. Saving lives by the development of antibiotics and overcoming antibiotic-resistant strains is an example where the patent and other financial incentives have fallen short. Inventing new antibiotics is a money losing endeavor as companies steer towards drugs along predecessor lines which may not have any more benefit than existing. Similarly, vaccine development has different value for those at high and low risks. Price for the vaccine is dependent on the individual and fails to calculate the benefit to others. The economic value of vaccine is miscalculated. Drug makers for a disease can make more money from than the treatment than from the vaccine. The distribution of risk is seldom considered. Another example of bias is indicated by the higher number of treatment options for late-stage cancer rather than earlier stage cancer when in fact the latter has higher value across a broader spectrum. Out of pocket costs to the end users tend to be quite close to the marginal costs of manufacturing the drug.

Previous book summary:

1. https://1drv.ms/w/s!Ashlm-Nw-wnWhPI80x8PN6ekOh1GlQ?e=B7hJe0

2. SummarizerCodeSnippets.docx

3. https://coursera.org/share/89dd61377bad7e93402c5bb3440414af

4. https://coursera.org/share/b1e019fd7028b96f54a057db4d11ea85

Saturday, August 3, 2024

#!/bin/bash

set -e

curl https://repo.anaconda.com/archive/Anaconda3-2024.02-1-Linux-x86_64.sh --output Anaconda3-2024.02-1-Linux-x86_64.sh

chmod 755 Anaconda3-2024.02-1-Linux-x86_64.sh

./Anaconda3-2024.02-1-Linux-x86_64.sh -b

# This script creates a custom conda environment and kernel based on a sample yml file.

echo "installation complete"

cat <<EOF > env.yaml

channels:

- conda-forge

- defaults

dependencies:

- python=3.11

- numpy

- pyspark

- pip

- pip:

- azureml-core

- ipython

- ipykernel

- pyspark==3.5

EOF

echo "env.yaml written"

/anaconda/condabin/conda env create -f env.yaml

echo "Initializing new conda environment"

/anaconda/condabin/conda init bash

#!/bin/bash

set -e

echo "Activating new conda environment"

/anaconda/envs/azureml_py38/bin/conda init --all

/anaconda/envs/azureml_py38/bin/conda init bash

export PATH="/anaconda/condabin:$PATH"

export name="python3.11_spark3.5"

conda install -p "/anaconda/envs/$name" -y ipykernel anaconda::pyspark anaconda::conda

conda -v activate "$name" && true

echo "Installing kernel"

sudo -u azureuser -i <<'EOF'

export name="python3.11_spark3.5"

export pathToPython3="/anaconda/envs/$name/bin/python3"

$pathToPython3 -m pip install pip --upgrade

$pathToPython3 -m pip install pyopenssl --upgrade

$pathToPython3 -m pip install pyspark==3.5

$pathToPython3 -m pip install snowflake-snowpark-python==1.20.0

$pathToPython3 -m pip install snowflake-connector-python==3.11.0

$pathToPython3 -m pip install azure-keyvault

$pathToPython3 -m pip install azure-identity

$pathToPython3 -m pip install ipykernel==v6.29.5

$pathToPython3 -m ipykernel install --user --name "$name" --display-name "Python 3.11 - Spark 3.5 (DSS)"

echo "Conda environment setup successfully."

EOF

Friday, August 2, 2024

#!/bin/bash

set -e

curl https://repo.anaconda.com/archive/Anaconda3-2024.02-1-Linux-x86_64.sh --output Anaconda3-2024.02-1-Linux-x86_64.sh

chmod 755 Anaconda3-2024.02-1-Linux-x86_64.sh

./Anaconda3-2024.02-1-Linux-x86_64.sh -b

# This script creates a custom conda environment and kernel based on a sample yml file.

echo "installation complete"

cat <<EOF > env.yaml

channels:

- conda-forge

- defaults

dependencies:

- python=3.11

- numpy

- pyspark

- pip

- pip:

- azureml-core

- ipython

- ipykernel

- pyspark==3.5

EOF

echo "env.yaml written"

/anaconda/condabin/conda env create -f env.yaml

echo "Initializing new conda environment"

/anaconda/condabin/conda init bash

#!/bin/bash

set -e

python3 -m pip install ipykernel==v6.29.5

python3 -m ipykernel install --user --name python3.11_spark3.5 --display-name "Python 3.11 - Spark 3.5 (DSS)"

echo "Activating new conda environment"

/anaconda/envs/azureml_py38/bin/conda init bash

/anaconda/envs/azureml_py38/bin/conda activate python3.11_spark3.5

/anaconda/envs/azureml_py38/bin/conda install -y ipykernel anaconda::pyspark

echo "Installing kernel"

sudo -u azureuser -i <<'EOF'

python3 -m pip install pip --upgrade

pip3 install pyopenssl --upgrade

pip3 install pyspark==3.5

pip3 install snowflake-snowpark-python==1.20.0

pip3 install snowflake-connector-python==3.11.0

pip3 install azure-keyvault

pip3 install azure-identity

python3 -m pip install ipykernel==v6.29.5

echo "Conda environment setup successfully."

EOF

Previous articles: https://1drv.ms/w/s!Ashlm-Nw-wnWhPIt_-X-iYdnygX-fA?e=ZCKWsR

Thursday, August 1, 2024

This is the summary of the book titled “The Start-up of you” written by Reid Hoffman and Ben Casnocha and published by Crown in 2012. LinkedIn founder Reid Hoffman and venture capitalist Ben Casnocha advise that the workplace has changed with globalization and technology, and one must use their entrepreneurial roots to grow their career. This “entrepreneurial mindset” treats each day as “Day One”. Entrepreneurs need personal capital, goals, and an amenable market. No one is self-made. Success requires a strong social network. Individuals must learn to seize the moment, make friends with risk, and solicit information from relationships.

In the late 20th century, building a career was similar to an "escalator" - a traditional American path. However, globalization and the digital revolution have made this traditional American career path obsolete. Companies no longer offer professional career development support or training, and employees are now "free agents" who must adopt an entrepreneurial mindset.

Entrepreneurs need personal capital, goals, and an amenable market. They need a competitive advantage through assets, aspirations, values, and market realities. When developing a career plan, consider financial assets, hard skills, and soft skills. Review immediate and long-term goals to determine the best job for you.

Twenty-first century careers demand flexibility and the capacity to adapt. Companies may face unexpected competition or modern technology, creating fresh pressure on employees and employers. Entrepreneurs must be persistent in fulfilling their vision while adapting to market feedback and customer needs.

To adapt to changing circumstances and pivot in a new direction, entrepreneurs and career strivers can adopt a useful planning framework. They optimize their initial vision, deploy competitive advantages, and reformulate as needed.

Success in business requires a strong social network, as even solo entrepreneurs or start-up leaders need help from others. Building authentic relationships and collaborating with others is crucial for success. Entrepreneurs should seize the moment when opportunities arise, as transformative opportunities rarely crop up. Curious entrepreneurs find inspiration in unexpected places and events, making connections.

To take smart risks, entrepreneurs should weigh possible benefits against likely downsides. Risk tolerance varies, and each person's risk tolerance is different. Assessing risk effectively is essential, as it changes over time and with situations. Start-up entrepreneurs and people with varying job levels should decide whether to take risks or not.

In conclusion, entrepreneurs need to be proactive in helping and collaborating with others, seizing opportunities, and making friends with risk. By understanding and addressing risks, entrepreneurs can create a strong professional network and navigate the challenges of their careers.

Relationships provide crucial information for businesses and leaders, as well as entrepreneurs and ambitious professionals. LinkedIn, co-founded by Reid Hoffman, is an online platform that allows people to connect with professionals and share their professional identities. Network literacy is becoming increasingly important, as it allows individuals to find and utilize information from their networks. A person's social network is a unique sensor that provides insights on assorted topics. To make informed decisions, it is essential to ask well-formed questions and be generous with the people in your network.

In the “Age of the Inconceivable”, events like the coronavirus pandemic and climate change bring disastrous consequences. People who thrive during these events harness their entrepreneurial impulses, but even born entrepreneurs need to cultivate their natural entrepreneurial impulses systematically. In today's breakneck change and uncertainty, traditional career strategies and paths will not work. Career success requires adopting a startup entrepreneur mindset. The good news is that today's world is changing, and it is essential to adapt to these changes and embrace a startup entrepreneur mindset.

References:

Previous book summary: https://1drv.ms/w/s!Ashlm-Nw-wnWhPIrwpUwG1feNrJbyg?e=nTgNlk

https://1drv.ms/w/s!Ashlm-Nw-wnWhOYMyD1A8aq_fBqraA?e=2CuChd

Wednesday, July 31, 2024

Problem 4

The relationship "friend" is often symmetric, meaning that if I am your friend, you are my friend. Implement a MapReduce algorithm to check whether this property holds. Generate a list of all non-symmetric friend relationships.

Map Input

Each input record is a 2 element list [personA, personB] where personA is a string representing the name of a person and personB is a string representing the name of one of personA's friends. Note that it may or may not be the case that the personA is a friend of personB.

Reduce Output

The output should be all pairs (friend, person) such that (person, friend) appears in the dataset but (friend, person) does not.

You can test your solution to this problem using friends.json:

Answer:

import MapReduce

import json

import sys

# Part 1

mr = MapReduce.MapReduce()

people = ["Myriel","Geborand", "Champtercier", "Count", "OldMan", "Valjean", "Napoleon", "MlleBaptistine", "MmeMagloire", "Labarre", "Marguerite", "MmeDeR", "Isabeau", "Fantine", "Cosette", "Simplice", "Woman1", "Judge", "Woman2", "Gillenormand", "MlleGillenormand", "Babet", "Montparnasse"]

persons = []

# Part 2

def mapper(record):

for friend1 in people:

for friend2 in people:

if friend1 == friend2:

continue

if friend1 == record[0] and friend2 == record[1]:

mr.emit_intermediate((friend1,friend2), 1)

else:

mr.emit_intermediate((friend1,friend2), 0)

# Part 3

def reducer(key, list_of_values):

#print(repr((key, list_of_values)))

if 1 in list_of_values:

pass

else:

mr.emit(key)

# Part 4

inputdata = open(sys.argv[1])

mr.execute(inputdata, mapper, reducer)

Sample output:

["MlleBaptistine", "Myriel"]

["MlleBaptistine", "MmeMagloire"]

["MlleBaptistine", "Valjean"]

["Fantine", "Valjean"]

["Cosette", "Valjean"]

#codingexercise: https://1drv.ms/w/s!Ashlm-Nw-wnWhPIxEJaEe9_uKGDHgg?e=mrXrYM

Tuesday, July 30, 2024

Problem 4

Map Input

Reduce Output

The output should be all pairs (friend, person) such that (person, friend) appears in the dataset but (friend, person) does not.

You can test your solution to this problem using friends.json:

Answer:

import MapReduce

import json

import sys

# Part 1

mr = MapReduce.MapReduce()

persons = []

# Part 2

def mapper(record):

for friend1 in people:

for friend2 in people:

if friend1 == friend2:

continue

if friend1 == record[0] and friend2 == record[1]:

mr.emit_intermediate((friend1,friend2), 1)

else:

mr.emit_intermediate((friend1,friend2), 0)

# Part 3

def reducer(key, list_of_values):

#print(repr((key, list_of_values)))

if 1 in list_of_values:

pass

else:

mr.emit(key)

# Part 4

inputdata = open(sys.argv[1])

mr.execute(inputdata, mapper, reducer)

Sample output:

["MlleBaptistine", "Myriel"]

["MlleBaptistine", "MmeMagloire"]

["MlleBaptistine", "Valjean"]

["Fantine", "Valjean"]

["Cosette", "Valjean"]