Saturday, August 3, 2024

 When describing the Azure Machine Learning Workspace deployments via IaC and its shortcomings and corresponding resolutions, it was hinted that the workspace and all its infrastructure concerns can be resolved at deployment time so that the data scientists are free to focus on business use cases. Part of this setup involves kernel creation that can be done via  scripts during the creation and assignment of compute to the data scientists. There are two scripts required one at the creation time and other at the start of the compute. Some commads require the terminal to be restarted, so the split in the scripts helps with the stages to specify them. For example, to provision a python 3.11 and spark 3.5 based custom kernel, the following scripts come useful:


#!/bin/bash

  

set -e


curl https://repo.anaconda.com/archive/Anaconda3-2024.02-1-Linux-x86_64.sh --output Anaconda3-2024.02-1-Linux-x86_64.sh

chmod 755 Anaconda3-2024.02-1-Linux-x86_64.sh

./Anaconda3-2024.02-1-Linux-x86_64.sh -b

# This script creates a custom conda environment and kernel based on a sample yml file.

echo "installation complete"

cat <<EOF > env.yaml

name: python3.11_spark3.5

channels:

  - conda-forge

  - defaults

dependencies:

  - python=3.11

  - numpy

  - pyspark

  - pip

  - pip:

    - azureml-core

    - ipython

    - ipykernel

    - pyspark==3.5

EOF

echo "env.yaml written"

/anaconda/condabin/conda env create -f env.yaml

echo "Initializing new conda environment"

/anaconda/condabin/conda init bash


#!/bin/bash


set -e

echo "Activating new conda environment"

/anaconda/envs/azureml_py38/bin/conda init --all

/anaconda/envs/azureml_py38/bin/conda init bash

export PATH="/anaconda/condabin:$PATH"

export name="python3.11_spark3.5"

conda install -p "/anaconda/envs/$name" -y ipykernel anaconda::pyspark anaconda::conda

conda -v activate "$name" && true

echo "Installing kernel"

sudo -u azureuser -i <<'EOF'

export name="python3.11_spark3.5"

export pathToPython3="/anaconda/envs/$name/bin/python3"

$pathToPython3 -m pip install pip --upgrade

$pathToPython3 -m pip install pyopenssl --upgrade

$pathToPython3 -m pip install pyspark==3.5

$pathToPython3 -m pip install snowflake-snowpark-python==1.20.0

$pathToPython3 -m pip install snowflake-connector-python==3.11.0

$pathToPython3 -m pip install azure-keyvault

$pathToPython3 -m pip install azure-identity

$pathToPython3 -m pip install ipykernel==v6.29.5

$pathToPython3 -m ipykernel install --user --name "$name" --display-name "Python 3.11 - Spark 3.5 (DSS)"

echo "Conda environment setup successfully."

EOF


No comments:

Post a Comment