Saturday, June 1, 2024

 Automation can also be achieved with Azure Data Factory aka ADF and a self-hosted integration runtime that comprises of a vm hosted on-premises and a Script activity. While typically associated with Data Transformation activities, a self-hosted integration runtime  can participate in running any scripts and its invocation from ADF guarantees human and programmatic access from anywhere that has cloud connectivity. A self-hosted integration runtime is a component that connects data sources on-premises/ on Azure VM with cloud services in a secure and managed way

The Json syntax for defining a script looks something like this:

   "name": "<activity name>", 

   "type": "Script", 

   "linkedServiceName": { 

      "referenceName": "<name>", 

      "type": "LinkedServiceReference" 

    }, 

   "typeProperties": { 

      "scripts" : [ 

         { 

            "text": "<Script Block>", 

            "type": "<Query> or <NonQuery>", 

            "parameters":[ 

               { 

                  "name": "<name>", 

                  "value": "<value>", 

                  "type": "<type>", 

                  "direction": "<Input> or <Output> or <InputOutput>", 

                  "size": 256 

               }, 

               ... 

            ] 

         }, 

         ... 

      ],     

         ... 

         ] 

      }, 

      "scriptBlockExecutionTimeout": "<time>",  

      "logSettings": { 

         "logDestination": "<ActivityOutput> or <ExternalStore>", 

         "logLocationSettings":{ 

            "linkedServiceName":{ 

               "referenceName": "<name>", 

               "type": "<LinkedServiceReference>" 

            }, 

            "path": "<folder path>" 

         } 

      } 

    } 

}

The output can be collected everytime a script block is executed. There is a 5000 rows/4MB size limit but this is sufficient for most purposes.


A sample curl call would be something like this:

##! /usr/bin/python

import requests


# Set your ADF details

subscription_id = '<subscription_id>'

resource_group = '<resource_group>'

factory_name = '<factory_name>'


# Set the pipeline name you want to trigger

pipeline_name = 'your_pipeline_name'


# Construct the API URL

api_url = f"https://management.azure.com/subscriptions/{subscription_id}/resourceGroups/{resource_group}/providers/Microsoft.DataFactory/factories/{factory_name}/pipelines/{pipeline_name}/createRun?api-version=2017-03-01-preview"


# Make the POST request

response = requests.post(api_url)


# Check the response status

if response.status_code == 200:

    print("Pipeline triggered successfully!")

else:

    print(f"Error triggering pipeline. Status code: {response.status_code}")

## EOF


No comments:

Post a Comment