Task 06: Deploy Runners
DOCUMENT CATEGORY: Runbook Step SCOPE: CI/CD runner planning and deployment PURPOSE: Evaluate hosting options and deploy self-hosted CI/CD runners MASTER REFERENCE: Azure Local Toolkit — CI/CD Runner Module
Status: Active Applies To: All Azure Local deployments Last Updated: 2026-03-19
Objective
Select the appropriate runner hosting strategy and deploy self-hosted CI/CD runners that can execute automation pipelines against both Azure cloud resources and Azure Local on-premises clusters.
Why Self-Hosted Runners?
Azure Local automation has unique requirements that platform-hosted runners (GitHub-hosted, GitLab SaaS runners, Microsoft-hosted agents) cannot satisfy:
- On-premises access — Pipelines must reach Azure Local cluster endpoints, iDRACs, switches, and management networks that are not internet-routable.
- Hybrid targeting — A single pipeline may provision Azure cloud resources (landing zones, VPN gateways) and configure on-premises hardware (cluster registration, storage, networking).
- Long-running jobs — Terraform applies for VPN gateways, Arc registration, and cluster deployment can run 30–60+ minutes, exceeding hosted-runner timeouts.
- Persistent tooling — Runners need pre-installed tools (Terraform, Azure CLI, Ansible, PowerShell modules) to avoid downloading them on every run.
- Network security — Sensitive credentials and on-premises management traffic should not traverse shared, multi-tenant hosted-runner infrastructure.
Runner Hosting Options
Choosing where to deploy runners is a critical planning decision. The right answer depends on your network topology, connectivity between Azure and on-premises, security requirements, and existing infrastructure.
Decision Matrix
| Hosting Option | Azure Access | On-Prem Access | Connectivity Required | Best For |
|---|---|---|---|---|
| Azure VM / VMSS | ✅ Native | ⚠️ Requires VPN/ER | S2S VPN or ExpressRoute to on-prem | Cloud-first deployments where VPN/ER is already planned |
| On-Prem VM (Hyper-V / Azure Local) | ⚠️ Outbound internet | ✅ Native | Outbound HTTPS to SCM platform + Azure APIs | On-prem-first, or when Azure → on-prem connectivity is not yet established |
| Existing On-Prem Server | ⚠️ Outbound internet | ✅ Native | Outbound HTTPS to SCM platform + Azure APIs | Reuse existing infrastructure, minimal new provisioning |
| OpenGear Console Server | ⚠️ Limited | ✅ Native + OOB | Outbound HTTPS (or cellular backhaul) | Out-of-band automation, break-glass recovery, iDRAC/switch management |
| Hybrid (Azure + On-Prem) | ✅ Native | ✅ Native | Independent | Full coverage — Azure runner for cloud tasks, on-prem runner for local tasks |
Option 1: Azure VM or VMSS (Cloud-Hosted Runner)
Deploy a Linux or Windows VM (or VM Scale Set) in an Azure subscription. This is the default approach in the azurelocal-toolkit Terraform module.
Advantages:
- Autoscaling with VMSS — scale out for concurrent jobs, scale to zero when idle
- Managed infrastructure — Azure handles host OS patching, disk, and NIC
- Central placement near Azure control-plane APIs (ARM, Entra ID, Key Vault)
- Terraform module in
azurelocal-toolkithandles provisioning end-to-end
Disadvantages:
- Cannot reach on-premises management networks unless VPN or ExpressRoute is established
- Adds a dependency — runners cannot reach Azure Local cluster endpoints until Part 2 (VPN Gateway) is deployed
- VPN/ExpressRoute adds cost and complexity
When to use: You already have (or plan to have) site-to-site VPN or ExpressRoute between Azure and on-prem, and you want a centrally managed runner fleet.
If you choose this option, the runner will only be able to target Azure cloud resources until the VPN/ExpressRoute is deployed in Part 2 — Phase 04: VPN Gateway. Plan for this phased capability.
Connectivity patterns for Azure-hosted runners:
| Connection | Protocol | Purpose |
|---|---|---|
| S2S VPN (IKEv2/IPsec) | Encrypted tunnel over internet | Management traffic, Ansible/SSH to cluster nodes |
| ExpressRoute | Private peering | Production workloads, high-bandwidth operations |
| Azure Arc Gateway | HTTPS outbound from on-prem | Alternative when direct inbound to on-prem is blocked |
Recommended sizing (from Discovery Checklist):
| Workload | VM Size | Max Instances | Notes |
|---|---|---|---|
| Light (< 5 concurrent jobs) | Standard_D2s_v3 | 2 | Default for most deployments |
| Medium (5-10 concurrent jobs) | Standard_D4s_v3 | 5 | Multiple clusters or frequent pipelines |
| Heavy (10+ concurrent jobs) | Standard_D4s_v3 | 10 | Large-scale multi-site deployments |
Option 2: On-Premises VM
Deploy a Linux or Windows VM directly on a Hyper-V host, an existing Azure Local cluster, or bare-metal server at the site.
Advantages:
- Direct access to on-premises management networks — no VPN required
- Can reach cluster nodes, iDRACs, switches, and storage endpoints immediately
- Works even when Azure ↔ on-prem connectivity is not yet established
- Lower latency for on-prem operations (Ansible playbooks, PowerShell remoting)
Disadvantages:
- Must manage the VM yourself (OS patching, disk, backups)
- Requires outbound internet access to reach your SCM platform (GitHub/GitLab/Azure DevOps) and Azure ARM APIs
- Not centrally managed if you have multiple sites
- No autoscaling — fixed capacity
When to use: You need to run automation against on-premises targets before VPN/ExpressRoute is available, or your security policy prohibits inbound connectivity from Azure to on-prem networks.
Minimum requirements:
| Resource | Specification |
|---|---|
| OS | Ubuntu 22.04 LTS (recommended) or Windows Server 2022 |
| vCPUs | 2+ |
| RAM | 4 GB+ |
| Disk | 50 GB+ |
| Network | Outbound HTTPS (443) to SCM platform + Azure APIs; access to on-prem management VLANs |
Option 3: Existing On-Premises Server
Install the runner agent directly on an existing server that is already on the management network. This avoids provisioning new infrastructure entirely.
Advantages:
- Zero new infrastructure — reuse what you have
- Already on the correct network segments
- Fastest path to a working runner
Disadvantages:
- Shared workload — runner competes with other services for CPU/RAM
- Security risk if the server runs other sensitive services
- Harder to isolate runner dependencies (Terraform versions, CLI tools)
- Not reproducible — if the server dies, runner config is lost
When to use: Proof-of-concept, lab environments, or early-stage deployments where you want to validate pipelines before investing in dedicated runner infrastructure.
If you use an existing server, consider running the runner agent inside a container (Docker) to isolate its dependencies from the host OS. All three platforms support containerized runners.
Option 4: OpenGear Console Server
Use an OpenGear console server (e.g., OM1208-8E) as a lightweight runner for out-of-band (OOB) automation tasks.
Advantages:
- Always-on, independent of the cluster and primary network
- Direct serial and network access to iDRACs, switches, and PDUs
- Cellular backhaul provides connectivity even during network outages
- Ideal for break-glass recovery scenarios and hardware lifecycle tasks
Disadvantages:
- Limited compute resources — cannot run heavy Terraform applies or large Ansible playbooks
- Runs embedded Linux — not all runner agent versions are supported
- Best suited for lightweight, targeted tasks (firmware updates, power cycling, console access)
When to use: You have OpenGear devices deployed for OOB management and want to automate hardware-level tasks (iDRAC configuration, switch management, power operations) that don't require full Terraform/Ansible workloads.
Suitable tasks for OpenGear runners:
| Task | Example |
|---|---|
| Hardware power management | ipmitool power cycle via iDRAC |
| Firmware updates | Push firmware to cluster nodes via Redfish API |
| Switch configuration | Apply VLAN/BGP changes via serial console |
| Health checks | Verify iDRAC reachability, check hardware status |
| Break-glass recovery | Emergency cluster node reboot when primary network is down |
Option 5: Hybrid (Azure + On-Premises)
Deploy runners in both Azure and on-premises, using labels/tags to route pipeline jobs to the appropriate runner based on target.
Advantages:
- Full coverage — cloud-targeting jobs run on Azure runners, on-prem-targeting jobs run on local runners
- No single point of failure for connectivity
- Each runner has native access to its target environment
- Can start with on-prem only, add Azure runners later when VPN/ER is ready
Disadvantages:
- Two sets of infrastructure to manage
- Runner registration and labeling must be carefully planned
- Pipeline configuration is more complex (job routing by label)
When to use: Production deployments that need to target both Azure and on-premises resources reliably, or phased deployments where connectivity comes online over time.
- GitHub
- GitLab
- Azure DevOps
Label-based routing in GitHub Actions:
jobs:
deploy-azure:
runs-on: [self-hosted, azure]
steps:
- name: Deploy Azure resources
run: terraform apply -auto-approve
configure-cluster:
runs-on: [self-hosted, onprem]
needs: deploy-azure
steps:
- name: Configure Azure Local cluster
run: ansible-playbook -i inventory cluster-config.yml
Tag-based routing in GitLab CI:
deploy-azure:
tags: [azure]
script:
- terraform apply -auto-approve
configure-cluster:
tags: [onprem]
needs: [deploy-azure]
script:
- ansible-playbook -i inventory cluster-config.yml
Pool-based routing in Azure DevOps:
stages:
- stage: DeployAzure
pool: AzureRunners
jobs:
- job: Apply
steps:
- script: terraform apply -auto-approve
- stage: ConfigureCluster
dependsOn: DeployAzure
pool: OnPremRunners
jobs:
- job: Configure
steps:
- script: ansible-playbook -i inventory cluster-config.yml
Recommendation
For most Azure Local deployments, we recommend a phased approach:
- Start with an on-premises VM (Option 2) to unblock automation immediately — this runner can target both Azure APIs (outbound HTTPS) and on-prem infrastructure (direct network access).
- Add an Azure VMSS runner (Option 1) after VPN/ExpressRoute is deployed in Part 2 — this provides autoscaling, central management, and native Azure API access.
- Optionally add OpenGear (Option 4) for OOB hardware automation if console servers are deployed at the site.
This phased approach avoids blocking on VPN connectivity while building toward a production-grade hybrid runner fleet.
Prerequisites
- Runner hosting option selected (see Decision Matrix above)
- CI/CD setup complete (Tasks 01-05)
- Environment variables configured (Task 05)
- Network connectivity confirmed for chosen hosting option
- For Azure VM/VMSS: target subscription and region identified
- For on-prem VM: host server or hypervisor available, management VLAN access confirmed
- Runner registration token available from your SCM platform
Procedure
Step 1: Obtain Runner Registration Token
- GitHub
- GitLab
- Azure DevOps
- Navigate to your repository or organization on GitHub
- Go to Settings → Actions → Runners → New self-hosted runner
- Copy the registration token
Or via CLI:
# Organization-level runner (recommended)
gh api -X POST orgs/{org}/actions/runners/registration-token --jq '.token'
# Repository-level runner
gh api -X POST repos/{owner}/{repo}/actions/runners/registration-token --jq '.token'
- Navigate to your project or group in GitLab
- Go to Settings → CI/CD → Runners → New project runner
- Copy the registration token
Or via CLI:
# Create a runner authentication token (GitLab 16.0+)
curl --request POST "https://gitlab.com/api/v4/user/runners" \
--header "PRIVATE-TOKEN: <your-token>" \
--form "runner_type=project_type" \
--form "project_id=<project-id>" \
--form "tag_list=azure,terraform"
- Navigate to your Azure DevOps organization
- Go to Organization Settings → Agent Pools → select or create a pool
- Click New agent and note the instructions
- Create a PAT with Agent Pools (Read & Manage) scope
# PAT is used during agent configuration
# No separate registration token step — PAT is provided during ./config.sh
Step 2: Deploy Runner Infrastructure
Choose the deployment method that matches your selected hosting option.
Option A: Azure VM/VMSS via Terraform
Clone the deployment repository and apply the runner module:
git clone <your-deployment-repo>
cd <deployment-repo>
Example Terraform configuration:
- GitHub
- GitLab
- Azure DevOps
module "runner" {
source = "github.com/AzureLocal/azurelocal-toolkit//terraform/modules/cicd-runner"
runner_type = "github"
runner_token = var.runner_registration_token
runner_labels = ["azure", "terraform", "prod"]
location = "northcentralus"
environment = "prod"
vm_size = "Standard_D2s_v3"
vm_count = 2
os_type = "linux"
subscription_id = data.azurerm_subscription.current.id
tags = local.tags
}
module "runner" {
source = "github.com/AzureLocal/azurelocal-toolkit//terraform/modules/cicd-runner"
runner_type = "gitlab"
runner_token = var.runner_registration_token
runner_tags = ["azure", "terraform", "prod"]
location = "northcentralus"
environment = "prod"
vm_size = "Standard_D2s_v3"
vm_count = 2
os_type = "linux"
subscription_id = data.azurerm_subscription.current.id
tags = local.tags
}
module "runner" {
source = "github.com/AzureLocal/azurelocal-toolkit//terraform/modules/cicd-runner"
runner_type = "azdo"
agent_pool_name = "AzureRunners"
azdo_pat = var.azdo_pat
azdo_org_url = var.azdo_org_url
location = "northcentralus"
environment = "prod"
vm_size = "Standard_D2s_v3"
vm_count = 2
os_type = "linux"
subscription_id = data.azurerm_subscription.current.id
tags = local.tags
}
terraform init
terraform plan -out=runner.tfplan
terraform apply runner.tfplan
Option B: On-Premises VM (Manual Setup)
Provision a Linux VM on your hypervisor or existing server, then install the runner agent:
- GitHub
- GitLab
- Azure DevOps
# Download and install GitHub Actions runner
mkdir actions-runner && cd actions-runner
curl -o actions-runner-linux-x64.tar.gz -L \
https://github.com/actions/runner/releases/latest/download/actions-runner-linux-x64-2.321.0.tar.gz
tar xzf actions-runner-linux-x64.tar.gz
# Configure
./config.sh \
--url https://github.com/<org> \
--token <REGISTRATION_TOKEN> \
--name "onprem-runner-01" \
--labels "onprem,terraform,ansible" \
--work "_work" \
--runasservice
# Install and start as a service
sudo ./svc.sh install
sudo ./svc.sh start
# Install GitLab Runner
curl -L https://packages.gitlab.com/install/repositories/runner/gitlab-runner/script.deb.sh | sudo bash
sudo apt-get install gitlab-runner
# Register runner
sudo gitlab-runner register \
--non-interactive \
--url "https://gitlab.com/" \
--token "<RUNNER_TOKEN>" \
--executor "shell" \
--description "onprem-runner-01" \
--tag-list "onprem,terraform,ansible"
# Start service
sudo gitlab-runner start
# Download Azure DevOps agent
mkdir azagent && cd azagent
curl -o vsts-agent-linux-x64.tar.gz -L \
https://vstsagentpackage.azureedge.net/agent/4.248.0/vsts-agent-linux-x64-4.248.0.tar.gz
tar xzf vsts-agent-linux-x64.tar.gz
# Configure
./config.sh \
--unattended \
--url "https://dev.azure.com/<org>" \
--auth pat \
--token "<PAT>" \
--pool "OnPremRunners" \
--agent "onprem-runner-01" \
--acceptTeeEula
# Install and start as a service
sudo ./svc.sh install
sudo ./svc.sh start
Install required tools on the on-prem runner:
# Terraform
sudo apt-get update && sudo apt-get install -y gnupg software-properties-common
wget -O- https://apt.releases.hashicorp.com/gpg | gpg --dearmor | \
sudo tee /usr/share/keyrings/hashicorp-archive-keyring.gpg > /dev/null
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] \
https://apt.releases.hashicorp.com $(lsb_release -cs) main" | \
sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo apt-get update && sudo apt-get install terraform
# Azure CLI
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
# Ansible (for cluster configuration)
sudo apt-get install -y python3-pip
pip3 install ansible
# PowerShell (for Azure Local modules)
sudo apt-get install -y powershell
Verification
- GitHub
- GitLab
- Azure DevOps
- Runner deployed and online
- Runner registered in Settings → Actions → Runners (shows green "Idle" status)
- Runner labels applied correctly (e.g.,
azure,onprem,terraform) - Test job completes successfully:
# .github/workflows/test-runner.yml
name: Test Self-Hosted Runner
on: workflow_dispatch
jobs:
test:
runs-on: [self-hosted]
steps:
- run: |
echo "Runner: $(hostname)"
terraform version
az version
- Runner deployed and online
- Runner visible in Settings → CI/CD → Runners (shows green indicator)
- Runner tags applied correctly (e.g.,
azure,onprem,terraform) - Test job completes successfully:
# .gitlab-ci.yml
test-runner:
tags: [onprem]
script:
- echo "Runner: $(hostname)"
- terraform version
- az version
when: manual
- Agent deployed and online
- Agent visible in Organization Settings → Agent Pools (shows green indicator)
- Agent pool assigned correctly
- Test pipeline completes successfully:
# azure-pipelines.yml
trigger: none
pool: OnPremRunners
steps:
- script: |
echo "Agent: $(hostname)"
terraform version
az version
displayName: Test Self-Hosted Agent
Troubleshooting
Runner cannot reach SCM platform:
- Verify outbound HTTPS (port 443) is open to your SCM platform URL
- Check DNS resolution:
nslookup github.com/nslookup gitlab.com/nslookup dev.azure.com - If behind a proxy, configure the runner to use it (see platform-specific proxy docs)
Runner cannot reach Azure APIs:
- Verify outbound HTTPS (port 443) to
management.azure.com,login.microsoftonline.com,graph.microsoft.com - Test with:
az login --service-principal -u $ARM_CLIENT_ID -p $ARM_CLIENT_SECRET --tenant $ARM_TENANT_ID
On-prem runner cannot reach cluster nodes:
- Verify the runner VM is on the correct management VLAN (e.g., VLAN 2203)
- Test connectivity:
ping <cluster-node-ip>,ssh <admin>@<cluster-node-ip>
Azure runner cannot reach on-premises:
- VPN/ExpressRoute must be deployed first — see Part 2 — VPN Gateway
- Verify VPN tunnel status:
az network vpn-connection show --name <connection-name> -g <rg>
Next Steps
After completing Part 1, proceed to Part 2: Azure Foundation to establish the Azure cloud infrastructure including landing zones, networking, and security resources.