Version: 2411

Task 06: Deploy Runners

DOCUMENT CATEGORY: Runbook Step
SCOPE: CI/CD runner planning and deployment
PURPOSE: Evaluate hosting options and deploy self-hosted CI/CD runners
MASTER REFERENCE: Azure Local Toolkit — CI/CD Runner Module

Status: Active

Objective

Select the appropriate runner hosting strategy and deploy self-hosted CI/CD runners that can execute automation pipelines against both Azure cloud resources and Azure Local on-premises clusters.

Why Self-Hosted Runners?

Azure Local automation has unique requirements that platform-hosted runners (GitHub-hosted, GitLab SaaS runners, Microsoft-hosted agents) cannot satisfy:

On-premises access — Pipelines must reach Azure Local cluster endpoints, iDRACs, switches, and management networks that are not internet-routable.
Hybrid targeting — A single pipeline may provision Azure cloud resources (landing zones, VPN gateways) and configure on-premises hardware (cluster registration, storage, networking).
Long-running jobs — Terraform applies for VPN gateways, Arc registration, and cluster deployment can run 30–60+ minutes, exceeding hosted-runner timeouts.
Persistent tooling — Runners need pre-installed tools (Terraform, Azure CLI, Ansible, PowerShell modules) to avoid downloading them on every run.
Network security — Sensitive credentials and on-premises management traffic should not traverse shared, multi-tenant hosted-runner infrastructure.

Runner Hosting Options

Choosing where to deploy runners is a critical planning decision. The right answer depends on your network topology, connectivity between Azure and on-premises, security requirements, and existing infrastructure.

Decision Matrix

Hosting Option	Azure Access	On-Prem Access	Connectivity Required	Best For
Azure VM / VMSS	✅ Native	⚠️ Requires VPN/ER	S2S VPN or ExpressRoute to on-prem	Cloud-first deployments where VPN/ER is already planned
On-Prem VM (Hyper-V / Azure Local)	⚠️ Outbound internet	✅ Native	Outbound HTTPS to SCM platform + Azure APIs	On-prem-first, or when Azure → on-prem connectivity is not yet established
Existing On-Prem Server	⚠️ Outbound internet	✅ Native	Outbound HTTPS to SCM platform + Azure APIs	Reuse existing infrastructure, minimal new provisioning
OpenGear Console Server	⚠️ Limited	✅ Native + OOB	Outbound HTTPS (or cellular backhaul)	Out-of-band automation, break-glass recovery, iDRAC/switch management
Hybrid (Azure + On-Prem)	✅ Native	✅ Native	Independent	Full coverage — Azure runner for cloud tasks, on-prem runner for local tasks

Option 1: Azure VM or VMSS (Cloud-Hosted Runner)

Deploy a Linux or Windows VM (or VM Scale Set) in an Azure subscription. This is the default approach in the azurelocal-toolkit Terraform module.

Advantages:

Autoscaling with VMSS — scale out for concurrent jobs, scale to zero when idle
Managed infrastructure — Azure handles host OS patching, disk, and NIC
Central placement near Azure control-plane APIs (ARM, Entra ID, Key Vault)
Terraform module in azurelocal-toolkit handles provisioning end-to-end

Disadvantages:

Cannot reach on-premises management networks unless VPN or ExpressRoute is established
Adds a dependency — runners cannot reach Azure Local cluster endpoints until Part 2 (VPN Gateway) is deployed
VPN/ExpressRoute adds cost and complexity

When to use: You already have (or plan to have) site-to-site VPN or ExpressRoute between Azure and on-prem, and you want a centrally managed runner fleet.

Connectivity Dependency

If you choose this option, the runner will only be able to target Azure cloud resources until the VPN/ExpressRoute is deployed in Part 2 — Phase 04: VPN Gateway. Plan for this phased capability.

Connectivity patterns for Azure-hosted runners:

Connection	Protocol	Purpose
S2S VPN (IKEv2/IPsec)	Encrypted tunnel over internet	Management traffic, Ansible/SSH to cluster nodes
ExpressRoute	Private peering	Production workloads, high-bandwidth operations
Azure Arc Gateway	HTTPS outbound from on-prem	Alternative when direct inbound to on-prem is blocked

Recommended sizing (from Discovery Checklist):

Workload	VM Size	Max Instances	Notes
Light (< 5 concurrent jobs)	`Standard_D2s_v3`	2	Default for most deployments
Medium (5-10 concurrent jobs)	`Standard_D4s_v3`	5	Multiple clusters or frequent pipelines
Heavy (10+ concurrent jobs)	`Standard_D4s_v3`	10	Large-scale multi-site deployments

Option 2: On-Premises VM

Deploy a Linux or Windows VM directly on a Hyper-V host, an existing Azure Local cluster, or bare-metal server at the site.

Advantages:

Direct access to on-premises management networks — no VPN required
Can reach cluster nodes, iDRACs, switches, and storage endpoints immediately
Works even when Azure ↔ on-prem connectivity is not yet established
Lower latency for on-prem operations (Ansible playbooks, PowerShell remoting)

Disadvantages:

Must manage the VM yourself (OS patching, disk, backups)
Requires outbound internet access to reach your SCM platform (GitHub/GitLab/Azure DevOps) and Azure ARM APIs
Not centrally managed if you have multiple sites
No autoscaling — fixed capacity

When to use: You need to run automation against on-premises targets before VPN/ExpressRoute is available, or your security policy prohibits inbound connectivity from Azure to on-prem networks.

Minimum requirements:

Resource	Specification
OS	Ubuntu 22.04 LTS (recommended) or Windows Server 2022
vCPUs	2+
RAM	4 GB+
Disk	50 GB+
Network	Outbound HTTPS (443) to SCM platform + Azure APIs; access to on-prem management VLANs

Option 3: Existing On-Premises Server

Install the runner agent directly on an existing server that is already on the management network. This avoids provisioning new infrastructure entirely.

Advantages:

Zero new infrastructure — reuse what you have
Already on the correct network segments
Fastest path to a working runner

Disadvantages:

Shared workload — runner competes with other services for CPU/RAM
Security risk if the server runs other sensitive services
Harder to isolate runner dependencies (Terraform versions, CLI tools)
Not reproducible — if the server dies, runner config is lost

When to use: Proof-of-concept, lab environments, or early-stage deployments where you want to validate pipelines before investing in dedicated runner infrastructure.

Isolation

If you use an existing server, consider running the runner agent inside a container (Docker) to isolate its dependencies from the host OS. All three platforms support containerized runners.

Option 4: OpenGear Console Server

Use an OpenGear console server (e.g., OM1208-8E) as a lightweight runner for out-of-band (OOB) automation tasks.

Advantages:

Always-on, independent of the cluster and primary network
Direct serial and network access to iDRACs, switches, and PDUs
Cellular backhaul provides connectivity even during network outages
Ideal for break-glass recovery scenarios and hardware lifecycle tasks

Disadvantages:

Limited compute resources — cannot run heavy Terraform applies or large Ansible playbooks
Runs embedded Linux — not all runner agent versions are supported
Best suited for lightweight, targeted tasks (firmware updates, power cycling, console access)

When to use: You have OpenGear devices deployed for OOB management and want to automate hardware-level tasks (iDRAC configuration, switch management, power operations) that don't require full Terraform/Ansible workloads.

Suitable tasks for OpenGear runners:

Task	Example
Hardware power management	`ipmitool` power cycle via iDRAC
Firmware updates	Push firmware to cluster nodes via Redfish API
Switch configuration	Apply VLAN/BGP changes via serial console
Health checks	Verify iDRAC reachability, check hardware status
Break-glass recovery	Emergency cluster node reboot when primary network is down

Option 5: Hybrid (Azure + On-Premises)

Deploy runners in both Azure and on-premises, using labels/tags to route pipeline jobs to the appropriate runner based on target.

Advantages:

Full coverage — cloud-targeting jobs run on Azure runners, on-prem-targeting jobs run on local runners
No single point of failure for connectivity
Each runner has native access to its target environment
Can start with on-prem only, add Azure runners later when VPN/ER is ready

Disadvantages:

Two sets of infrastructure to manage
Runner registration and labeling must be carefully planned
Pipeline configuration is more complex (job routing by label)

When to use: Production deployments that need to target both Azure and on-premises resources reliably, or phased deployments where connectivity comes online over time.

GitHub
GitLab
Azure DevOps

Label-based routing in GitHub Actions:

jobs:
  deploy-azure:
    runs-on: [self-hosted, azure]
    steps:
      - name: Deploy Azure resources
        run: terraform apply -auto-approve

  configure-cluster:
    runs-on: [self-hosted, onprem]
    needs: deploy-azure
    steps:
      - name: Configure Azure Local cluster
        run: ansible-playbook -i inventory cluster-config.yml

Tag-based routing in GitLab CI:

deploy-azure:
  tags: [azure]
  script:
    - terraform apply -auto-approve

configure-cluster:
  tags: [onprem]
  needs: [deploy-azure]
  script:
    - ansible-playbook -i inventory cluster-config.yml

Pool-based routing in Azure DevOps:

stages:
  - stage: DeployAzure
    pool: AzureRunners
    jobs:
      - job: Apply
        steps:
          - script: terraform apply -auto-approve

  - stage: ConfigureCluster
    dependsOn: DeployAzure
    pool: OnPremRunners
    jobs:
      - job: Configure
        steps:
          - script: ansible-playbook -i inventory cluster-config.yml

Recommendation

For most Azure Local deployments, we recommend a phased approach:

Start with an on-premises VM (Option 2) to unblock automation immediately — this runner can target both Azure APIs (outbound HTTPS) and on-prem infrastructure (direct network access).
Add an Azure VMSS runner (Option 1) after VPN/ExpressRoute is deployed in Part 2 — this provides autoscaling, central management, and native Azure API access.
Optionally add OpenGear (Option 4) for OOB hardware automation if console servers are deployed at the site.

This phased approach avoids blocking on VPN connectivity while building toward a production-grade hybrid runner fleet.

Prerequisites

Runner hosting option selected (see Decision Matrix above)
CI/CD setup complete (Tasks 01-05)
Environment variables configured (Task 05)
Network connectivity confirmed for chosen hosting option
For Azure VM/VMSS: target subscription and region identified
For on-prem VM: host server or hypervisor available, management VLAN access confirmed
Runner registration token available from your SCM platform

Procedure

Step 1: Obtain Runner Registration Token

GitHub
GitLab
Azure DevOps

Navigate to your repository or organization on GitHub
Go to Settings → Actions → Runners → New self-hosted runner
Copy the registration token

Or via CLI:

# Organization-level runner (recommended)
gh api -X POST orgs/{org}/actions/runners/registration-token --jq '.token'

# Repository-level runner
gh api -X POST repos/{owner}/{repo}/actions/runners/registration-token --jq '.token'

Navigate to your project or group in GitLab
Go to Settings → CI/CD → Runners → New project runner
Copy the registration token

Or via CLI:

# Create a runner authentication token (GitLab 16.0+)
curl --request POST "https://gitlab.com/api/v4/user/runners" \
  --header "PRIVATE-TOKEN: <your-token>" \
  --form "runner_type=project_type" \
  --form "project_id=<project-id>" \
  --form "tag_list=azure,terraform"

Navigate to your Azure DevOps organization
Go to Organization Settings → Agent Pools → select or create a pool
Click New agent and note the instructions
Create a PAT with Agent Pools (Read & Manage) scope

# PAT is used during agent configuration
# No separate registration token step — PAT is provided during ./config.sh

Step 2: Deploy Runner Infrastructure

Choose the deployment method that matches your selected hosting option.

Option A: Azure VM/VMSS via Terraform

Clone the deployment repository and apply the runner module:

git clone <your-deployment-repo>
cd <deployment-repo>

Example Terraform configuration:

GitHub
GitLab
Azure DevOps

module "runner" {
  source = "github.com/AzureLocal/azurelocal-toolkit//terraform/modules/cicd-runner"

  runner_type     = "github"
  runner_token    = var.runner_registration_token
  runner_labels   = ["azure", "terraform", "prod"]
  location        = "northcentralus"
  environment     = "prod"
  vm_size         = "Standard_D2s_v3"
  vm_count        = 2
  os_type         = "linux"
  subscription_id = data.azurerm_subscription.current.id
  tags            = local.tags
}

module "runner" {
  source = "github.com/AzureLocal/azurelocal-toolkit//terraform/modules/cicd-runner"

  runner_type     = "gitlab"
  runner_token    = var.runner_registration_token
  runner_tags     = ["azure", "terraform", "prod"]
  location        = "northcentralus"
  environment     = "prod"
  vm_size         = "Standard_D2s_v3"
  vm_count        = 2
  os_type         = "linux"
  subscription_id = data.azurerm_subscription.current.id
  tags            = local.tags
}

module "runner" {
  source = "github.com/AzureLocal/azurelocal-toolkit//terraform/modules/cicd-runner"

  runner_type     = "azdo"
  agent_pool_name = "AzureRunners"
  azdo_pat        = var.azdo_pat
  azdo_org_url    = var.azdo_org_url
  location        = "northcentralus"
  environment     = "prod"
  vm_size         = "Standard_D2s_v3"
  vm_count        = 2
  os_type         = "linux"
  subscription_id = data.azurerm_subscription.current.id
  tags            = local.tags
}

terraform init
terraform plan -out=runner.tfplan
terraform apply runner.tfplan

Option B: On-Premises VM (Manual Setup)

Provision a Linux VM on your hypervisor or existing server, then install the runner agent:

GitHub
GitLab
Azure DevOps

# Download and install GitHub Actions runner
mkdir actions-runner && cd actions-runner
curl -o actions-runner-linux-x64.tar.gz -L \
  https://github.com/actions/runner/releases/latest/download/actions-runner-linux-x64-2.321.0.tar.gz
tar xzf actions-runner-linux-x64.tar.gz

# Configure
./config.sh \
  --url https://github.com/<org> \
  --token <REGISTRATION_TOKEN> \
  --name "onprem-runner-01" \
  --labels "onprem,terraform,ansible" \
  --work "_work" \
  --runasservice

# Install and start as a service
sudo ./svc.sh install
sudo ./svc.sh start

# Install GitLab Runner
curl -L https://packages.gitlab.com/install/repositories/runner/gitlab-runner/script.deb.sh | sudo bash
sudo apt-get install gitlab-runner

# Register runner
sudo gitlab-runner register \
  --non-interactive \
  --url "https://gitlab.com/" \
  --token "<RUNNER_TOKEN>" \
  --executor "shell" \
  --description "onprem-runner-01" \
  --tag-list "onprem,terraform,ansible"

# Start service
sudo gitlab-runner start

# Download Azure DevOps agent
mkdir azagent && cd azagent
curl -o vsts-agent-linux-x64.tar.gz -L \
  https://vstsagentpackage.azureedge.net/agent/4.248.0/vsts-agent-linux-x64-4.248.0.tar.gz
tar xzf vsts-agent-linux-x64.tar.gz

# Configure
./config.sh \
  --unattended \
  --url "https://dev.azure.com/<org>" \
  --auth pat \
  --token "<PAT>" \
  --pool "OnPremRunners" \
  --agent "onprem-runner-01" \
  --acceptTeeEula

# Install and start as a service
sudo ./svc.sh install
sudo ./svc.sh start

Install required tools on the on-prem runner:

# Terraform
sudo apt-get update && sudo apt-get install -y gnupg software-properties-common
wget -O- https://apt.releases.hashicorp.com/gpg | gpg --dearmor | \
  sudo tee /usr/share/keyrings/hashicorp-archive-keyring.gpg > /dev/null
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] \
  https://apt.releases.hashicorp.com $(lsb_release -cs) main" | \
  sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo apt-get update && sudo apt-get install terraform

# Azure CLI
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash

# Ansible (for cluster configuration)
sudo apt-get install -y python3-pip
pip3 install ansible

# PowerShell (for Azure Local modules)
sudo apt-get install -y powershell

Verification

GitHub
GitLab
Azure DevOps

Runner deployed and online
Runner registered in Settings → Actions → Runners (shows green "Idle" status)
Runner labels applied correctly (e.g., azure, onprem, terraform)
Test job completes successfully:

# .github/workflows/test-runner.yml
name: Test Self-Hosted Runner
on: workflow_dispatch
jobs:
  test:
    runs-on: [self-hosted]
    steps:
      - run: |
          echo "Runner: $(hostname)"
          terraform version
          az version

Runner deployed and online
Runner visible in Settings → CI/CD → Runners (shows green indicator)
Runner tags applied correctly (e.g., azure, onprem, terraform)
Test job completes successfully:

# .gitlab-ci.yml
test-runner:
  tags: [onprem]
  script:
    - echo "Runner: $(hostname)"
    - terraform version
    - az version
  when: manual

Agent deployed and online
Agent visible in Organization Settings → Agent Pools (shows green indicator)
Agent pool assigned correctly
Test pipeline completes successfully:

# azure-pipelines.yml
trigger: none
pool: OnPremRunners
steps:
  - script: |
      echo "Agent: $(hostname)"
      terraform version
      az version
    displayName: Test Self-Hosted Agent

Variables from variables.yml

Variable	Config Path	Example
Runner Pool Name	`cicd.runners.pool_name`	`self-hosted-pool`
Runner Image	`cicd.runners.image`	`ubuntu-latest`
Runner Count	`cicd.runners.count`	`2`

Toolkit Reference

Scripts for this task are located in the azurelocal-toolkit repository under scripts/deploy/ in the appropriate task folder.

Alternatives

The procedures in this task use the scripted methods shown in the tabs above. Additional deployment methods including Azure CLI and Bash scripts are available in the azurelocal-toolkit repository under scripts/deploy/.

Method	Description
Azure CLI	PowerShell-based Azure CLI scripts for Azure resource operations
Bash	Linux/macOS compatible shell scripts for pipeline environments

Previous	Up	Next
← Task 05: Configure Environment Variables	Phase 01: CI/CD Setup	---

Troubleshooting

Runner cannot reach SCM platform:

Verify outbound HTTPS (port 443) is open to your SCM platform URL
Check DNS resolution: nslookup github.com / nslookup gitlab.com / nslookup dev.azure.com
If behind a proxy, configure the runner to use it (see platform-specific proxy docs)

Runner cannot reach Azure APIs:

Verify outbound HTTPS (port 443) to management.azure.com, login.microsoftonline.com, graph.microsoft.com
Test with: az login --service-principal -u $ARM_CLIENT_ID -p $ARM_CLIENT_SECRET --tenant $ARM_TENANT_ID

On-prem runner cannot reach cluster nodes:

Verify the runner VM is on the correct management VLAN (e.g., VLAN 2203)
Test connectivity: ping <cluster-node-ip>, ssh <admin>@<cluster-node-ip>

Azure runner cannot reach on-premises:

VPN/ExpressRoute must be deployed first — see Part 2 — VPN Gateway
Verify VPN tunnel status: az network vpn-connection show --name <connection-name> -g <rg>

Next Steps

After completing Part 1, proceed to Part 2: Azure Foundation to establish the Azure cloud infrastructure including landing zones, networking, and security resources.

References

Version Control

Version	Date	Author	Changes
1.0.0	2025-03-25	Azure Local Cloud	Initial release

Objective​

Why Self-Hosted Runners?​

Runner Hosting Options​

Decision Matrix​

Option 1: Azure VM or VMSS (Cloud-Hosted Runner)​

Option 2: On-Premises VM​

Option 3: Existing On-Premises Server​

Option 4: OpenGear Console Server​

Option 5: Hybrid (Azure + On-Premises)​

Recommendation​

Prerequisites​

Procedure​

Step 1: Obtain Runner Registration Token​

Step 2: Deploy Runner Infrastructure​

Option A: Azure VM/VMSS via Terraform​

Option B: On-Premises VM (Manual Setup)​

Verification​

Variables from variables.yml​

Alternatives​

Navigation​

Troubleshooting​

Next Steps​

References​

Version Control​

Objective

Why Self-Hosted Runners?

Runner Hosting Options

Decision Matrix

Option 1: Azure VM or VMSS (Cloud-Hosted Runner)

Option 2: On-Premises VM

Option 3: Existing On-Premises Server

Option 4: OpenGear Console Server

Option 5: Hybrid (Azure + On-Premises)

Recommendation

Prerequisites

Procedure

Step 1: Obtain Runner Registration Token

Step 2: Deploy Runner Infrastructure

Option A: Azure VM/VMSS via Terraform

Option B: On-Premises VM (Manual Setup)

Verification

Variables from variables.yml

Alternatives

Navigation

Troubleshooting

Next Steps

References

Version Control