This is a continuation of a series of articles on
Infrastructure-as-platform code, its shortcomings and resolutions. IaC full service does not stop at just the
provisioning of resources. The trust that the clients place on the IaC based
deployment service is that their use cases will be enabled and remain
operational without hassles. As an example, since we were discussing azure
machine learning workspace, one of the use cases is to draw data from sources
other than Azure provided storage accounts such as Snowflake. Execution of
Snowflake on this workspace requires PySpark library and support from the java
and Scala as well as jars specific to Snowflake.
This means that the workspace deployment will only be
complete when the necessary prerequisites are installed. If the built-in
doesn’t support, some customization is required. And in many cases, these come
back to IaC configurations as much as there is automation possible via
inclusion of scripts.
In this case of machine learning workspace, a custom kernel
might be required for supporting snowflake workloads. Such a kernel can be
installed by passing in an initialization script that writes out a kernel
specification in yaml file that can in turn be used to initialize and activate
the kernel. Additionally, the jars can be downloaded specific to Snowflake that
includes their common library, support for Spark code execution and the
official scala language execution jars.
Such a kernel might look something like this:
name: customkernel
channels:
- conda-forge
- defaults
dependencies:
- python=3.11
- numpy
- pyspark
- pip
- pip:
- azureml-core
- ipython
- ipykernel
- pyspark==3.5.1
When the Spark session is started, the configuration
specified can include the path to the jars. These additional steps must be
taken to go the full length of onboarding customer workloads. Previous article
references: IacResolutionsPart97.docx
No comments:
Post a Comment