This is a continuation of the discussion on Azure Data Platform.
GitHub actions are highly versatile and composable workflows
that can be used to respond to events in the code pipeline. These events can be
a single commit or a pull-request that needs to be build and tested or the deployment
of a code change that has been pushed to production. While automations for
continuous integration and continuous deployment are well-established practices
in DevOps, GitHub Actions goes beyond DevOps by recognizing events that can
include the creation of a ticket or an issue. The components of a GitHub
actions are the workflow triggered for an event and the jobs that the workflow
comprises of. Each job will run inside its own virtual machine runner, or
inside a container and has one or more steps that either run a script or an
action that can be reused.
A workflow is authored in the .github/workflows/ directory
and specified as a Yaml file. It has a sample format like so:
name: initial-workflow-0001
run-name: Trying out GitHub
Actions
on: [push]
jobs:
check-variables:
runs-on: ubuntu-latest
environment: nonProd
steps:
- name: Initialize
uses: actions/setup-node@v3
with:
node-version: '16'
- name: Display Variables
env:
variable: 0001
if: github.ref == ‘refs/heads/main’
&& github.event_name == ‘push’
run: |
echo 000-“$variable”
echo 001-“${{ env.variable }}”
And whenever there is a commit added to the repository in
which the workflows directory was created, the above workflow will be
triggered. It can be viewed with the Actions tab in the repository where the
named job can be selected, and it steps expanded for the activities performed.
With this introduction, let us check out a scenario for
using it with the Azure Data Platform. Specifically, this scenario calls for promoting
the Azure Data Factory to higher environments with the following steps:
1.
Development data factory that is created and
configured with GitHub.
2.
Feature/working branch is used for making
changes to pipelines, etc.
3.
When changes are ready to be reviewed, a pull
request is created from the feature/working branch to the main collaboration
branch.
4.
Once the pull request is approved and changes
merged with the main collaboration branch, the changes are published to the
development factory.
5.
When the changes are published, data factory
saves its ARM templates on the main publish branch (adf_publish by default). A
file called ARMTemplateForFactory.json contains parameter names used for
resources like key vault, storage, linked services, etc. These names are used
in GitHub Actions workflow file to pass resource names for different
environments.
6.
Once the GitHub Actions workflow file has been
updated to reflect the parameter values for upper environment and changes
pushed back to GitHub branch, the GitHub Actions workflow is started manually
and changes pushed to the upper environment.
Some observations to make are as follows:
GitHub is configured on development data factory only.
Integration Runtime names and types need to stay the same
across all environments.
Self-hosted Integration Runtime must be online in upper
environments before deployment or else it will fail.
Key Vault secret names are kept the same across
environments, only the vault name is parameterized.
Resource naming needs to avoid spaces.
Secrets stored in GitHub Secrets section.
There are two environments – dev and prod in this scenario.
Dev branch is used as collaboration branch.
Feature1 branch is used as a working branch.
Main branch is used as publish branch.