Cluster computing

Friday, March 31, 2023

This is a continuation of the discussion on Azure Data Platform.

GitHub actions are highly versatile and composable workflows that can be used to respond to events in the code pipeline. These events can be a single commit or a pull-request that needs to be build and tested or the deployment of a code change that has been pushed to production. While automations for continuous integration and continuous deployment are well-established practices in DevOps, GitHub Actions goes beyond DevOps by recognizing events that can include the creation of a ticket or an issue. The components of a GitHub actions are the workflow triggered for an event and the jobs that the workflow comprises of. Each job will run inside its own virtual machine runner, or inside a container and has one or more steps that either run a script or an action that can be reused.

A workflow is authored in the .github/workflows/ directory and specified as a Yaml file. It has a sample format like so:

name: initial-workflow-0001

run-name: Trying out GitHub Actions

on: [push]

jobs:

check-variables:

runs-on: ubuntu-latest

environment: nonProd

steps:

- name: Initialize

uses: actions/setup-node@v3

with:

node-version: '16'

- name: Display Variables

env:

variable: 0001

if: github.ref == ‘refs/heads/main’ && github.event_name == ‘push’

run: |

echo 000-“$variable”

echo 001-“${{ env.variable }}”

And whenever there is a commit added to the repository in which the workflows directory was created, the above workflow will be triggered. It can be viewed with the Actions tab in the repository where the named job can be selected, and it steps expanded for the activities performed.

With this introduction, let us check out a scenario for using it with the Azure Data Platform. Specifically, this scenario calls for promoting the Azure Data Factory to higher environments with the following steps:

1. Development data factory that is created and configured with GitHub.

2. Feature/working branch is used for making changes to pipelines, etc.

3. When changes are ready to be reviewed, a pull request is created from the feature/working branch to the main collaboration branch.

4. Once the pull request is approved and changes merged with the main collaboration branch, the changes are published to the development factory.

5. When the changes are published, data factory saves its ARM templates on the main publish branch (adf_publish by default). A file called ARMTemplateForFactory.json contains parameter names used for resources like key vault, storage, linked services, etc. These names are used in GitHub Actions workflow file to pass resource names for different environments.

6. Once the GitHub Actions workflow file has been updated to reflect the parameter values for upper environment and changes pushed back to GitHub branch, the GitHub Actions workflow is started manually and changes pushed to the upper environment.

Some observations to make are as follows:

GitHub is configured on development data factory only.

Integration Runtime names and types need to stay the same across all environments.

Self-hosted Integration Runtime must be online in upper environments before deployment or else it will fail.

Key Vault secret names are kept the same across environments, only the vault name is parameterized.

Resource naming needs to avoid spaces.

Secrets stored in GitHub Secrets section.

There are two environments – dev and prod in this scenario.

Dev branch is used as collaboration branch.

Feature1 branch is used as a working branch.

Main branch is used as publish branch.

Cluster computing

Friday, March 31, 2023

No comments:

Post a Comment