- Azure Data Factory - Self-Hosted Integration Runtime Windows Container
This is a working solution on how to use Azure Data Factory Self-Hosted Integration Runtime running inside a Windows container.
Image Version | ADF Self-Hosted Runtime Version | Bundled Drivers |
---|---|---|
1.0.0 | 5.10.7918.2 | N/A |
1.0.1 | 5.12.7984.1 | IBM DB2 ODBC Driver 5.11.4 |
1.0.2 | 5.24.8369.1 | IBM DB2 ODBC Driver 5.11.4 |
1.0.3 (latest) | 5.36.8726.3 | IBM DB2 ODBC Driver 5.11.4 |
You can find a pre-built version of the image in our Docker Hub account:
Below are the environment variables the image understands.
Variable | Default | Description |
---|---|---|
AUTH_KEY | No (Required) | The ADF authentication key |
NODE_NAME | container hostname |
The name of the node that will be displayed in ADF. |
OFFLINE_NODE_AUTO_DELETION_TIME_IN_SECONDS | 601 (10 minutes) |
The number of seconds that a node has to be offline to be automatically cleaned up from ADF. (it has to be the same for all nodes in the same runtime) |
ENABLE_HA | false |
If you are planning to use multiple containers (nodes) in a single runtime, please set this to true. |
HA_PORT | 8060 |
The HA port used for communication between the nodes. |
git clone https://github.com/ingenii-solutions/azure-data-factory-self-hosted-runtime.git
cp .env.dist .env
Add your Azure Data Factory Authentication Keys (connection strings) to each variable:
# Azure Data Factory Connection Strings
PRODUCTION_CONNECTION_STRING="<add prod auth key here>"
TEST_CONNECTION_STRING="<add test auth key here>" # Optional
DEV_CONNECTION_STRING="<add dev auth key here>" # Optional
We have the following templates available:
Template | Filename | Descripton |
---|---|---|
Single | docker-compose.single.yml | Single container deployment. Needs only the PRODUCTION_CONNECTION_STRING to be set. |
Dev,Test,Prod (DTP) | docker-compose.dtp.yml | Multi-environment deployment. Needs all connection string variables set. |
docker-compose -f docker-compose.<template>.yml up -d
You can monitor the status of the container(s) by using docker-compose ps
command.
docker-compose -f <filename> logs
- View output from containersdocker-compose -f <filename> restart
- Restart containersdocker-compose -f <filename> down
- Stop and remove resources
You can also use the docker cli directly to start the container. Here are some examples:
docker run -e AUTH_KEY="IR@xxx" -e ENABLE_HA=true ingeniisolutions/adf-self-hosted-integration-runtime
docker run -d -e AUTH_KEY="IR@xxx" -e ENABLE_HA=true ingeniisolutions/adf-self-hosted-integration-runtime
A service error occurred (StatusCode: 400; ErrorCode: 1847; ActivityId: 7c596324-5649-4859-aedc-c700611339df; ErrorMessage: OfflineNodeAutoDeletionTimeInSeconds should be same among the SHIR nodes and it should be 601.).
All nodes in a runtime have to have the same value for OFFLINE_NODE_AUTO_DELETION_TIME_IN_SECONDS
enviornment variable.
Also, if you are to have more than one node, you need to set ENABLE_HA to true
.
A service error occurred (StatusCode: 400; ErrorCode: 1500; ActivityId: 54ed9ef2-cedd-4e9c-93b4-650ee527e862; ErrorMessage: Exception of type 'Microsoft.DataTransfer.GatewayService.Client.GatewayServiceException' was thrown.)
You most likely have 4 registered nodes with the current runtime. Azure Data Factory supports only 4 registered nodes per integration runtime.
Applies to: IBM DB2 ODBC Driver Only
The error is generated whenever the docker image is executed with isolation mode set to hyperv
and not process
.
This typically occurs on Windows Desktop OS as that defaults to isolated: hyperv mode and Windows Server OS defaults to isolated: process mode.
If you are going to be using IBM DB2 ODBC Driver, it is highly suggested to run the image inside Windows Server 2019 or above. Isolation mode should be set to process
.
This repository was heavily inspired by what was already done here by @wxygeek
- Add a Github workflow to automatically build and publish new versions
- Add a self-termiantion logic that would automatically terminate the Docker Host instance if running in Azure or AWS. This would help in scenarios where the self-hosted integration runtime is deployed via Azure Functions/Lambda only for a job that is needed and then automatically terminated when no jobs are pending for execution. There isn't any better way of using the Docker image natively in Azure/AWS with the benefit of VNET/VPC integration and keeping the costs low by terminating the instance after every run.