Skip to main content
Version: Next

Task 06: Deploy Runners

Runbook GitHub GitLab Azure DevOps Terraform

DOCUMENT CATEGORY: Runbook Step SCOPE: CI/CD runner planning and deployment PURPOSE: Evaluate hosting options and deploy self-hosted CI/CD runners MASTER REFERENCE: Azure Local Toolkit — CI/CD Runner Module

Status: Active Applies To: All Azure Local deployments Last Updated: 2026-03-19


Objective

Select the appropriate runner hosting strategy and deploy self-hosted CI/CD runners that can execute automation pipelines against both Azure cloud resources and Azure Local on-premises clusters.


Why Self-Hosted Runners?

Azure Local automation has unique requirements that platform-hosted runners (GitHub-hosted, GitLab SaaS runners, Microsoft-hosted agents) cannot satisfy:

  • On-premises access — Pipelines must reach Azure Local cluster endpoints, iDRACs, switches, and management networks that are not internet-routable.
  • Hybrid targeting — A single pipeline may provision Azure cloud resources (landing zones, VPN gateways) and configure on-premises hardware (cluster registration, storage, networking).
  • Long-running jobs — Terraform applies for VPN gateways, Arc registration, and cluster deployment can run 30–60+ minutes, exceeding hosted-runner timeouts.
  • Persistent tooling — Runners need pre-installed tools (Terraform, Azure CLI, Ansible, PowerShell modules) to avoid downloading them on every run.
  • Network security — Sensitive credentials and on-premises management traffic should not traverse shared, multi-tenant hosted-runner infrastructure.

Runner Hosting Options

Choosing where to deploy runners is a critical planning decision. The right answer depends on your network topology, connectivity between Azure and on-premises, security requirements, and existing infrastructure.

Decision Matrix

Hosting OptionAzure AccessOn-Prem AccessConnectivity RequiredBest For
Azure VM / VMSS✅ Native⚠️ Requires VPN/ERS2S VPN or ExpressRoute to on-premCloud-first deployments where VPN/ER is already planned
On-Prem VM (Hyper-V / Azure Local)⚠️ Outbound internet✅ NativeOutbound HTTPS to SCM platform + Azure APIsOn-prem-first, or when Azure → on-prem connectivity is not yet established
Existing On-Prem Server⚠️ Outbound internet✅ NativeOutbound HTTPS to SCM platform + Azure APIsReuse existing infrastructure, minimal new provisioning
OpenGear Console Server⚠️ Limited✅ Native + OOBOutbound HTTPS (or cellular backhaul)Out-of-band automation, break-glass recovery, iDRAC/switch management
Hybrid (Azure + On-Prem)✅ Native✅ NativeIndependentFull coverage — Azure runner for cloud tasks, on-prem runner for local tasks

Option 1: Azure VM or VMSS (Cloud-Hosted Runner)

Deploy a Linux or Windows VM (or VM Scale Set) in an Azure subscription. This is the default approach in the azurelocal-toolkit Terraform module.

Advantages:

  • Autoscaling with VMSS — scale out for concurrent jobs, scale to zero when idle
  • Managed infrastructure — Azure handles host OS patching, disk, and NIC
  • Central placement near Azure control-plane APIs (ARM, Entra ID, Key Vault)
  • Terraform module in azurelocal-toolkit handles provisioning end-to-end

Disadvantages:

  • Cannot reach on-premises management networks unless VPN or ExpressRoute is established
  • Adds a dependency — runners cannot reach Azure Local cluster endpoints until Part 2 (VPN Gateway) is deployed
  • VPN/ExpressRoute adds cost and complexity

When to use: You already have (or plan to have) site-to-site VPN or ExpressRoute between Azure and on-prem, and you want a centrally managed runner fleet.

Connectivity Dependency

If you choose this option, the runner will only be able to target Azure cloud resources until the VPN/ExpressRoute is deployed in Part 2 — Phase 04: VPN Gateway. Plan for this phased capability.

Connectivity patterns for Azure-hosted runners:

ConnectionProtocolPurpose
S2S VPN (IKEv2/IPsec)Encrypted tunnel over internetManagement traffic, Ansible/SSH to cluster nodes
ExpressRoutePrivate peeringProduction workloads, high-bandwidth operations
Azure Arc GatewayHTTPS outbound from on-premAlternative when direct inbound to on-prem is blocked

Recommended sizing (from Discovery Checklist):

WorkloadVM SizeMax InstancesNotes
Light (< 5 concurrent jobs)Standard_D2s_v32Default for most deployments
Medium (5-10 concurrent jobs)Standard_D4s_v35Multiple clusters or frequent pipelines
Heavy (10+ concurrent jobs)Standard_D4s_v310Large-scale multi-site deployments

Option 2: On-Premises VM

Deploy a Linux or Windows VM directly on a Hyper-V host, an existing Azure Local cluster, or bare-metal server at the site.

Advantages:

  • Direct access to on-premises management networks — no VPN required
  • Can reach cluster nodes, iDRACs, switches, and storage endpoints immediately
  • Works even when Azure ↔ on-prem connectivity is not yet established
  • Lower latency for on-prem operations (Ansible playbooks, PowerShell remoting)

Disadvantages:

  • Must manage the VM yourself (OS patching, disk, backups)
  • Requires outbound internet access to reach your SCM platform (GitHub/GitLab/Azure DevOps) and Azure ARM APIs
  • Not centrally managed if you have multiple sites
  • No autoscaling — fixed capacity

When to use: You need to run automation against on-premises targets before VPN/ExpressRoute is available, or your security policy prohibits inbound connectivity from Azure to on-prem networks.

Minimum requirements:

ResourceSpecification
OSUbuntu 22.04 LTS (recommended) or Windows Server 2022
vCPUs2+
RAM4 GB+
Disk50 GB+
NetworkOutbound HTTPS (443) to SCM platform + Azure APIs; access to on-prem management VLANs

Option 3: Existing On-Premises Server

Install the runner agent directly on an existing server that is already on the management network. This avoids provisioning new infrastructure entirely.

Advantages:

  • Zero new infrastructure — reuse what you have
  • Already on the correct network segments
  • Fastest path to a working runner

Disadvantages:

  • Shared workload — runner competes with other services for CPU/RAM
  • Security risk if the server runs other sensitive services
  • Harder to isolate runner dependencies (Terraform versions, CLI tools)
  • Not reproducible — if the server dies, runner config is lost

When to use: Proof-of-concept, lab environments, or early-stage deployments where you want to validate pipelines before investing in dedicated runner infrastructure.

Isolation

If you use an existing server, consider running the runner agent inside a container (Docker) to isolate its dependencies from the host OS. All three platforms support containerized runners.

Option 4: OpenGear Console Server

Use an OpenGear console server (e.g., OM1208-8E) as a lightweight runner for out-of-band (OOB) automation tasks.

Advantages:

  • Always-on, independent of the cluster and primary network
  • Direct serial and network access to iDRACs, switches, and PDUs
  • Cellular backhaul provides connectivity even during network outages
  • Ideal for break-glass recovery scenarios and hardware lifecycle tasks

Disadvantages:

  • Limited compute resources — cannot run heavy Terraform applies or large Ansible playbooks
  • Runs embedded Linux — not all runner agent versions are supported
  • Best suited for lightweight, targeted tasks (firmware updates, power cycling, console access)

When to use: You have OpenGear devices deployed for OOB management and want to automate hardware-level tasks (iDRAC configuration, switch management, power operations) that don't require full Terraform/Ansible workloads.

Suitable tasks for OpenGear runners:

TaskExample
Hardware power managementipmitool power cycle via iDRAC
Firmware updatesPush firmware to cluster nodes via Redfish API
Switch configurationApply VLAN/BGP changes via serial console
Health checksVerify iDRAC reachability, check hardware status
Break-glass recoveryEmergency cluster node reboot when primary network is down

Option 5: Hybrid (Azure + On-Premises)

Deploy runners in both Azure and on-premises, using labels/tags to route pipeline jobs to the appropriate runner based on target.

Advantages:

  • Full coverage — cloud-targeting jobs run on Azure runners, on-prem-targeting jobs run on local runners
  • No single point of failure for connectivity
  • Each runner has native access to its target environment
  • Can start with on-prem only, add Azure runners later when VPN/ER is ready

Disadvantages:

  • Two sets of infrastructure to manage
  • Runner registration and labeling must be carefully planned
  • Pipeline configuration is more complex (job routing by label)

When to use: Production deployments that need to target both Azure and on-premises resources reliably, or phased deployments where connectivity comes online over time.

Label-based routing in GitHub Actions:

jobs:
deploy-azure:
runs-on: [self-hosted, azure]
steps:
- name: Deploy Azure resources
run: terraform apply -auto-approve

configure-cluster:
runs-on: [self-hosted, onprem]
needs: deploy-azure
steps:
- name: Configure Azure Local cluster
run: ansible-playbook -i inventory cluster-config.yml

Recommendation

For most Azure Local deployments, we recommend a phased approach:

  1. Start with an on-premises VM (Option 2) to unblock automation immediately — this runner can target both Azure APIs (outbound HTTPS) and on-prem infrastructure (direct network access).
  2. Add an Azure VMSS runner (Option 1) after VPN/ExpressRoute is deployed in Part 2 — this provides autoscaling, central management, and native Azure API access.
  3. Optionally add OpenGear (Option 4) for OOB hardware automation if console servers are deployed at the site.

This phased approach avoids blocking on VPN connectivity while building toward a production-grade hybrid runner fleet.


Prerequisites

  • Runner hosting option selected (see Decision Matrix above)
  • CI/CD setup complete (Tasks 01-05)
  • Environment variables configured (Task 05)
  • Network connectivity confirmed for chosen hosting option
  • For Azure VM/VMSS: target subscription and region identified
  • For on-prem VM: host server or hypervisor available, management VLAN access confirmed
  • Runner registration token available from your SCM platform

Procedure

Step 1: Obtain Runner Registration Token

  1. Navigate to your repository or organization on GitHub
  2. Go to SettingsActionsRunnersNew self-hosted runner
  3. Copy the registration token

Or via CLI:

# Organization-level runner (recommended)
gh api -X POST orgs/{org}/actions/runners/registration-token --jq '.token'

# Repository-level runner
gh api -X POST repos/{owner}/{repo}/actions/runners/registration-token --jq '.token'

Step 2: Deploy Runner Infrastructure

Choose the deployment method that matches your selected hosting option.

Option A: Azure VM/VMSS via Terraform

Clone the deployment repository and apply the runner module:

git clone <your-deployment-repo>
cd <deployment-repo>

Example Terraform configuration:

module "runner" {
source = "github.com/AzureLocal/azurelocal-toolkit//terraform/modules/cicd-runner"

runner_type = "github"
runner_token = var.runner_registration_token
runner_labels = ["azure", "terraform", "prod"]
location = "northcentralus"
environment = "prod"
vm_size = "Standard_D2s_v3"
vm_count = 2
os_type = "linux"
subscription_id = data.azurerm_subscription.current.id
tags = local.tags
}
terraform init
terraform plan -out=runner.tfplan
terraform apply runner.tfplan

Option B: On-Premises VM (Manual Setup)

Provision a Linux VM on your hypervisor or existing server, then install the runner agent:

# Download and install GitHub Actions runner
mkdir actions-runner && cd actions-runner
curl -o actions-runner-linux-x64.tar.gz -L \
https://github.com/actions/runner/releases/latest/download/actions-runner-linux-x64-2.321.0.tar.gz
tar xzf actions-runner-linux-x64.tar.gz

# Configure
./config.sh \
--url https://github.com/<org> \
--token <REGISTRATION_TOKEN> \
--name "onprem-runner-01" \
--labels "onprem,terraform,ansible" \
--work "_work" \
--runasservice

# Install and start as a service
sudo ./svc.sh install
sudo ./svc.sh start

Install required tools on the on-prem runner:

# Terraform
sudo apt-get update && sudo apt-get install -y gnupg software-properties-common
wget -O- https://apt.releases.hashicorp.com/gpg | gpg --dearmor | \
sudo tee /usr/share/keyrings/hashicorp-archive-keyring.gpg > /dev/null
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] \
https://apt.releases.hashicorp.com $(lsb_release -cs) main" | \
sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo apt-get update && sudo apt-get install terraform

# Azure CLI
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash

# Ansible (for cluster configuration)
sudo apt-get install -y python3-pip
pip3 install ansible

# PowerShell (for Azure Local modules)
sudo apt-get install -y powershell

Verification

  • Runner deployed and online
  • Runner registered in Settings → Actions → Runners (shows green "Idle" status)
  • Runner labels applied correctly (e.g., azure, onprem, terraform)
  • Test job completes successfully:
# .github/workflows/test-runner.yml
name: Test Self-Hosted Runner
on: workflow_dispatch
jobs:
test:
runs-on: [self-hosted]
steps:
- run: |
echo "Runner: $(hostname)"
terraform version
az version

Troubleshooting

Runner cannot reach SCM platform:

  • Verify outbound HTTPS (port 443) is open to your SCM platform URL
  • Check DNS resolution: nslookup github.com / nslookup gitlab.com / nslookup dev.azure.com
  • If behind a proxy, configure the runner to use it (see platform-specific proxy docs)

Runner cannot reach Azure APIs:

  • Verify outbound HTTPS (port 443) to management.azure.com, login.microsoftonline.com, graph.microsoft.com
  • Test with: az login --service-principal -u $ARM_CLIENT_ID -p $ARM_CLIENT_SECRET --tenant $ARM_TENANT_ID

On-prem runner cannot reach cluster nodes:

  • Verify the runner VM is on the correct management VLAN (e.g., VLAN 2203)
  • Test connectivity: ping <cluster-node-ip>, ssh <admin>@<cluster-node-ip>

Azure runner cannot reach on-premises:

  • VPN/ExpressRoute must be deployed first — see Part 2 — VPN Gateway
  • Verify VPN tunnel status: az network vpn-connection show --name <connection-name> -g <rg>

Next Steps

After completing Part 1, proceed to Part 2: Azure Foundation to establish the Azure cloud infrastructure including landing zones, networking, and security resources.


References