When describing the Azure Machine Learning Workspace deployments via IaC and its shortcomings and corresponding resolutions, it was hinted that the workspace and all its infrastructure concerns can be resolved at deployment time so that the data scientists are free to focus on business use cases. Part of this setup involves kernel creation that can be done via scripts during the creation and assignment of compute to the data scientists. There are two scripts required one at the creation time and other at the start of the compute. Some commads require the terminal to be restarted, so the split in the scripts helps with the stages to specify them. For example, to provision a python 3.11 and spark 3.5 based custom kernel, the following scripts come useful:
#!/bin/bash
set -e
curl https://repo.anaconda.com/archive/Anaconda3-2024.02-1-Linux-x86_64.sh --output Anaconda3-2024.02-1-Linux-x86_64.sh
chmod 755 Anaconda3-2024.02-1-Linux-x86_64.sh
./Anaconda3-2024.02-1-Linux-x86_64.sh -b
# This script creates a custom conda environment and kernel based on a sample yml file.
echo "installation complete"
cat <<EOF > env.yaml
name: python3.11_spark3.5
channels:
- conda-forge
- defaults
dependencies:
- python=3.11
- numpy
- pyspark
- pip
- pip:
- azureml-core
- ipython
- ipykernel
- pyspark==3.5
EOF
echo "env.yaml written"
/anaconda/condabin/conda env create -f env.yaml
echo "Initializing new conda environment"
/anaconda/condabin/conda init bash
#!/bin/bash
set -e
echo "Activating new conda environment"
/anaconda/envs/azureml_py38/bin/conda init --all
/anaconda/envs/azureml_py38/bin/conda init bash
export PATH="/anaconda/condabin:$PATH"
export name="python3.11_spark3.5"
conda install -p "/anaconda/envs/$name" -y ipykernel anaconda::pyspark anaconda::conda
conda -v activate "$name" && true
echo "Installing kernel"
sudo -u azureuser -i <<'EOF'
export name="python3.11_spark3.5"
export pathToPython3="/anaconda/envs/$name/bin/python3"
$pathToPython3 -m pip install pip --upgrade
$pathToPython3 -m pip install pyopenssl --upgrade
$pathToPython3 -m pip install pyspark==3.5
$pathToPython3 -m pip install snowflake-snowpark-python==1.20.0
$pathToPython3 -m pip install snowflake-connector-python==3.11.0
$pathToPython3 -m pip install azure-keyvault
$pathToPython3 -m pip install azure-identity
$pathToPython3 -m pip install ipykernel==v6.29.5
$pathToPython3 -m ipykernel install --user --name "$name" --display-name "Python 3.11 - Spark 3.5 (DSS)"
echo "Conda environment setup successfully."
EOF
No comments:
Post a Comment