Friday, August 2, 2024

 When describing the Azure Machine Learning Workspace deployments via IaC and its shortcomings and corresponding resolutions, it was hinted that the workspace and all its infrastructure concerns can be resolved at deployment time so that the data scientists are free to focus on business use cases. Part of this setup involves kernel creation that can be done via  scripts during the creation and assignment of compute to the data scientists. There are two scripts required one at the creation time and other at the start of the compute. Some commads require the terminal to be restarted, so the split in the scripts helps with the stages to specify them. For example, to provision a python 3.11 and spark 3.5 based custom kernel, the following scripts come useful:


#!/bin/bash

  

set -e


curl https://repo.anaconda.com/archive/Anaconda3-2024.02-1-Linux-x86_64.sh --output Anaconda3-2024.02-1-Linux-x86_64.sh

chmod 755 Anaconda3-2024.02-1-Linux-x86_64.sh

./Anaconda3-2024.02-1-Linux-x86_64.sh -b

# This script creates a custom conda environment and kernel based on a sample yml file.

echo "installation complete"

cat <<EOF > env.yaml

name: python3.11_spark3.5

channels:

  - conda-forge

  - defaults

dependencies:

  - python=3.11

  - numpy

  - pyspark

  - pip

  - pip:

    - azureml-core

    - ipython

    - ipykernel

    - pyspark==3.5

EOF

echo "env.yaml written"

/anaconda/condabin/conda env create -f env.yaml

echo "Initializing new conda environment"

/anaconda/condabin/conda init bash


#!/bin/bash


set -e

python3 -m pip install ipykernel==v6.29.5

python3 -m ipykernel install --user --name python3.11_spark3.5 --display-name "Python 3.11 - Spark 3.5 (DSS)"

echo "Activating new conda environment"

/anaconda/envs/azureml_py38/bin/conda init bash

/anaconda/envs/azureml_py38/bin/conda activate python3.11_spark3.5

/anaconda/envs/azureml_py38/bin/conda install -y ipykernel anaconda::pyspark

echo "Installing kernel"

sudo -u azureuser -i <<'EOF'

python3 -m pip install pip --upgrade

pip3 install pyopenssl --upgrade

pip3 install pyspark==3.5

pip3 install snowflake-snowpark-python==1.20.0

pip3 install snowflake-connector-python==3.11.0

pip3 install azure-keyvault

pip3 install azure-identity

python3 -m pip install ipykernel==v6.29.5

echo "Conda environment setup successfully."

EOF


Previous articles: https://1drv.ms/w/s!Ashlm-Nw-wnWhPIt_-X-iYdnygX-fA?e=ZCKWsR 


No comments:

Post a Comment