Scaling Plans¶

This guide explains exactly how AVD scaling plans work — what Azure resource gets deployed, how the autoscaler evaluates capacity every 15 minutes, what happens in each of the four schedule phases, the difference between the two load-balancing algorithms, and how each IaC tool implements it.

Why Scaling Plans Exist¶

AVD session hosts are VMs. VMs cost money when they're running, even if nobody is using them. A scaling plan tells the AVD service: "At 7 AM, start turning on hosts. At 9 AM, make sure there's plenty of capacity. At 5 PM, start draining users and shutting down hosts. Overnight, keep just a handful running."

Without a scaling plan, you either leave all hosts running 24/7 (expensive), or manually turn them on and off (error-prone, somebody forgets on Friday).

Scaling Plan Phases & Logic

Open the draw.io source for an editable version.

Pooled vs. Personal

Pooled host pools support full capacity-based autoscaling (this guide's focus). Personal host pools now also support scaling plans, but with a different model — personal scaling plans deallocate VMs based on user session state (signed out / disconnected), not capacity thresholds. This repo's IaC deploys scaling plans only for Pooled pools: count = var.scaling_enabled && var.host_pool_type == "Pooled" ? 1 : 0.

Azure Local Considerations¶

Scaling plans work on Azure Local session hosts — Microsoft explicitly supports both power management autoscaling and Start VM on Connect for session hosts on Azure and Azure Local. However, running on-premises introduces important differences compared to Azure-hosted VMs:

Azure Local-Specific Caveats

Fixed hardware capacity — Unlike Azure, you cannot burst beyond your physical cluster's compute capacity. Design minimum_hosts_pct and capacity thresholds conservatively. If all physical cores are committed, the autoscaler cannot power on additional VMs.
Azure connectivity required — The AVD autoscaler is a cloud-hosted service. It sends power commands through Azure Arc to the on-premises cluster. If the Azure Local cluster loses connectivity to Azure, scaling actions will not execute until the connection is restored.
Power action latency — Starting an Arc-enabled VM on Azure Local involves Azure API → Arc agent → Azure Local cluster → Hyper-V. This may add a few seconds of latency compared to starting an Azure VM directly. Factor this into ramp-up minimum_hosts_pct to pre-warm hosts before peak.
Dynamic autoscaling (preview) is not confirmed for Azure Local — Dynamic autoscaling (which creates/deletes VMs, not just power on/off) is currently only available in Azure. Azure Local VM provisioning requires Arc VM creation with specific logical networks and on-premises images, which the dynamic autoscaler does not support.
Host pool isolation — Azure and Azure Local session hosts cannot be mixed in the same host pool. Create separate host pools per environment.

Feature	Azure VMs	Azure Local (Arc VMs)
Power management autoscale	Supported	Supported
Start VM on Connect	Supported	Supported
Dynamic autoscale (create/delete VMs)	Preview	Not supported
Resource type	`Microsoft.Compute/virtualMachines`	`Microsoft.HybridCompute/machines`
RBAC role	DVU Power On/Off Contributor	Same role, same subscription-level assignment

What Gets Deployed¶

Azure Resource Type	Resource Name	What It Is
`Microsoft.DesktopVirtualization/scalingPlans`	`${host_pool_name}-scaling`	The scaling plan itself — contains schedule definitions, algorithm settings, capacity thresholds, and the link to the host pool.
`Microsoft.DesktopVirtualization/scalingPlans/pooledSchedules`	One per schedule entry	Child resource of the scaling plan — defines the four phase times, algorithms, and thresholds for a specific day pattern (e.g., weekdays).
`Microsoft.Authorization/roleAssignments`	Auto-created	The AVD service principal (first-party app `9cdead84-a844-4324-93f2-b2e6bb768d07`) must have Desktop Virtualization Power On/Off Contributor on the RG so it can actually start and stop VMs.

How the Autoscaler Works¶

The AVD autoscaler is a Microsoft-managed service — you don't deploy it. Once you create a scaling plan and associate it with a host pool, the service evaluates every 15 minutes (approximately) whether to turn hosts on or off.

Evaluation Logic¶

Every 15 minutes, the autoscaler:

Reads the current phase from the schedule (based on the current time in the configured time zone)
Counts total sessions across all running session hosts
Calculates used capacity percentage: (total sessions / total max sessions across running hosts) × 100
Compares to the capacity threshold for the current phase
If used capacity > threshold: turns on additional hosts (one at a time, wait for it to report healthy, then re-evaluate)
If used capacity < threshold (and in ramp-down or off-peak): drains a host and powers it off

"Drain mode" Explained¶

When the autoscaler decides to power off a host:

It sets the host to Drain mode — no new sessions are assigned to it
Existing users continue working — they're NOT kicked off immediately
If force_logoff is true and wait_time_minutes has elapsed, users get the notification message and are forcefully signed out
If force_logoff is false, the autoscaler waits for all users to sign out naturally — the host stays running until it's empty
Once all sessions are gone, the VM is deallocated (stopped and not billed for compute)

Schedule Phases — Deep Dive¶

Ramp-Up Phase¶

Purpose: Get hosts ready before the workday starts so users don't wait for VMs to boot.

How it works:

The autoscaler starts at the configured ramp_up.start_time (e.g., 07:00)
It calculates how many hosts are needed to meet minimum_hosts_pct — example: if you have 20 session hosts and minimum_hosts_pct: 25, at least 5 hosts must be running at 7:00 AM
It powers on hosts one by one until the minimum is met
As early-bird users log in, if the capacity_threshold_pct is exceeded, additional hosts are powered on
Load-balancing algorithm: typically BreadthFirst (spread users across hosts for best experience while they trickle in)

Configuration fields:

Field	Type	What It Controls	Example
`start_time`	String (HH:MM)	When this phase begins	`"07:00"`
`algorithm`	`BreadthFirst` or `DepthFirst`	How sessions are assigned to running hosts	`BreadthFirst`
`minimum_hosts_pct`	Integer (0-100)	Percentage of total hosts that must be running during this phase	`25`
`capacity_threshold_pct`	Integer (0-100)	When used capacity hits this %, turn on another host	`60`

Peak Phase¶

Purpose: Maximum availability. All needed hosts are running.

How it works:

Starts at peak.start_time (e.g., 09:00)
No minimum host percentage — the autoscaler only turns on more hosts if capacity_threshold_pct is exceeded
No hosts are turned off during peak, even if utilization drops
Load-balancing algorithm: typically BreadthFirst for best user experience

Configuration fields:

Field	Type	What It Controls	Example
`start_time`	String (HH:MM)	When peak begins	`"09:00"`
`algorithm`	`BreadthFirst` or `DepthFirst`	Session distribution strategy	`BreadthFirst`

Ramp-Down Phase¶

Purpose: Gracefully reduce capacity as users leave for the day.

How it works:

Starts at ramp_down.start_time (e.g., 17:00)
The autoscaler begins draining hosts that are below the capacity_threshold_pct
Sessions are consolidated onto fewer hosts (using DepthFirst to pack users tightly)
Once a host is empty, it's powered off
If force_logoff: true, users who haven't signed out after wait_time_minutes will see notification_message and be forcefully logged off after the timer expires
The autoscaler won't go below minimum_hosts_pct — ensures some capacity stays online while late workers finish

Configuration fields:

Field	Type	What It Controls	Example
`start_time`	String (HH:MM)	When ramp-down begins	`"17:00"`
`algorithm`	`BreadthFirst` or `DepthFirst`	Usually DepthFirst to consolidate	`DepthFirst`
`minimum_hosts_pct`	Integer (0-100)	Floor — don't go below this many hosts	`10`
`capacity_threshold_pct`	Integer (0-100)	Drain threshold — higher = more aggressive draining	`90`
`force_logoff`	Boolean	Whether to forcefully sign out users after the wait time	`false`
`wait_time_minutes`	Integer	Minutes to wait before forcing logoff (0 = immediate)	`30`
`notification_message`	String	Message shown to users before force logoff	`"Your session will be logged off in 30 minutes."`

Off-Peak Phase¶

Purpose: Minimum capacity overnight. Only keep a skeleton crew of hosts for after-hours users.

How it works:

Starts at off_peak.start_time (e.g., 19:00)
Continues draining and powering off hosts until only minimum_hosts_pct remain
Uses DepthFirst to consolidate any remaining sessions
If a late-night user logs in and capacity threshold is hit, the autoscaler will power on one host

Configuration fields:

Field	Type	What It Controls	Example
`start_time`	String (HH:MM)	When off-peak begins	`"19:00"`
`algorithm`	`BreadthFirst` or `DepthFirst`	Session distribution	`DepthFirst`

Load Balancing Algorithms — When to Use Which¶

BreadthFirst¶

What it does: Distributes sessions evenly across all available hosts. If Host A has 5 sessions and Host B has 3 sessions, the next user goes to Host B.

When to use: During ramp-up and peak. Users get more CPU/memory per user because the load is spread. Better user experience but more hosts running.

DepthFirst¶

What it does: Fills each host to capacity before using the next one. If Host A has room, the next user goes to Host A, even if Host B is empty.

When to use: During ramp-down and off-peak. Consolidates users onto fewer hosts so the empty hosts can be powered off. Worse per-user experience (more users per VM) but significant cost savings.

Prerequisites — Service Principal Role¶

The AVD scaling service runs under a Microsoft first-party application (not your own service principal). It needs RBAC permission to start/stop VMs in your resource group.

Service Principal Details:

Property	Value
Application Name	Windows Virtual Desktop
Application ID (Client ID)	`9cdead84-a844-4324-93f2-b2e6bb768d07`
Required Role	`Desktop Virtualization Power On/Off Contributor`
Scope	Resource Group containing session host VMs

This role assignment is not created by the scaling plan itself — your identity deployment (or a separate step) must create it. If this role is missing, the scaling plan will evaluate correctly but fail to actually start or stop VMs, and you'll see errors in the scaling plan diagnostics log.

Start VM on Connect — Separate RBAC¶

Start VM on Connect is a complementary feature (configured on the host pool, not the scaling plan) that powers on a session host when a user tries to connect and no running host is available. It is explicitly supported on both Azure and Azure Local.

Property	Value
Feature	Start VM on Connect (host pool property)
Required Role	`Desktop Virtualization Power On Contributor`
Scope	Subscription containing session host VMs
Config field	`control_plane.start_vm_on_connect: true`

Tip

Use Start VM on Connect together with scaling plans. During ramp-up, the scaling plan pre-warms hosts. During off-peak, if the scaling plan has powered off most hosts, Start VM on Connect ensures an after-hours user can still trigger a host to start automatically.

Note

The RBAC roles are different: scaling plans need Power On/Off Contributor (can start AND stop), while Start VM on Connect needs Power On Contributor (can only start). Both must be assigned at the subscription level for Azure Local to work correctly.

Configuration — Every Field Explained¶

scaling:
  enabled: true                        # Master toggle. If false, no scaling plan is deployed.
                                       # This repo deploys Pooled-type scaling plans only.
  time_zone: "Eastern Standard Time"   # Windows time zone name (NOT IANA/Linux format).
                                       # The autoscaler evaluates phase start times in this zone.
                                       # Common values: "Eastern Standard Time", "Pacific Standard Time",
                                       # "UTC", "W. Europe Standard Time", "AUS Eastern Standard Time"
  schedules:
    - name: weekday-schedule           # Display name in the Azure portal. Use descriptive names.
      days_of_week:                    # Which days this schedule applies to.
        - Monday                       # You can create multiple schedules — e.g., a weekday schedule
        - Tuesday                      # and a weekend schedule with different thresholds.
        - Wednesday
        - Thursday
        - Friday
      ramp_up:
        start_time: "07:00"           # Phase start. Format: HH:MM (24-hour, in the configured time zone).
        algorithm: BreadthFirst        # Spread users across hosts for best experience during ramp.
        minimum_hosts_pct: 25          # Pre-warm 25% of hosts before users arrive.
        capacity_threshold_pct: 60     # When 60% of running host capacity is used, power on another host.
      peak:
        start_time: "09:00"           # Workday begins. All hosts are already warm from ramp-up.
        algorithm: BreadthFirst       # Keep spreading for performance.
      ramp_down:
        start_time: "17:00"           # End of day — start consolidating.
        algorithm: DepthFirst          # Pack users tightly so we can power off empty hosts.
        minimum_hosts_pct: 10          # Don't go below 10% of hosts during ramp-down.
        capacity_threshold_pct: 90     # Only add a host if 90% of existing capacity is used (aggressive drain).
        force_logoff: false            # Don't forcefully sign out users.
        wait_time_minutes: 30          # (Only applies if force_logoff is true) Wait 30 min before force logoff.
        notification_message: "Your session will be logged off in 30 minutes."
      off_peak:
        start_time: "19:00"           # Overnight — skeleton crew.
        algorithm: DepthFirst         # Consolidate remaining sessions.

What Each IaC Tool Deploys — Resource by Resource¶

Terraform (`src/terraform/scaling.tf`)¶

Terraform Resource	Azure Resource Created	What It Does
`azurerm_virtual_desktop_scaling_plan.scaling[0]`	`Microsoft.DesktopVirtualization/scalingPlans`	Creates the scaling plan. Conditionally deployed: `count = var.scaling_enabled && var.host_pool_type == "Pooled" ? 1 : 0`. Links to the host pool via `host_pool` block.
(inline) `dynamic "schedule"` block	`pooledSchedules` child resource	Iterates over `var.scaling_schedules` list. Each entry creates one schedule with all four phases.

Terraform variable type for schedules:

variable "scaling_schedules" {
  type = list(object({
    name                                 = string
    days_of_week                         = list(string)
    ramp_up_start_time                   = string
    ramp_up_load_balancing_algorithm     = string
    ramp_up_minimum_hosts_percent        = number
    ramp_up_capacity_threshold_percent   = number
    peak_start_time                      = string
    peak_load_balancing_algorithm        = string
    ramp_down_start_time                 = string
    ramp_down_load_balancing_algorithm   = string
    ramp_down_minimum_hosts_percent      = number
    ramp_down_capacity_threshold_percent = number
    ramp_down_force_logoff_users         = bool
    ramp_down_wait_time_minutes          = number
    ramp_down_notification_message       = string
    off_peak_start_time                  = string
    off_peak_load_balancing_algorithm    = string
  }))
}

Bicep (`src/bicep/scaling.bicep`)¶

The Bicep module creates the same resource using individual parameters rather than a complex object. Each phase gets its own parameter set:

Parameter Pattern	Example	Maps To
`rampUp*`	`rampUpStartTime`, `rampUpAlgorithm`, `rampUpMinHostsPct`, `rampUpCapacityThresholdPct`	Ramp-up phase definition
`peak*`	`peakStartTime`, `peakAlgorithm`	Peak phase definition
`rampDown*`	`rampDownStartTime`, `rampDownAlgorithm`, `rampDownMinHostsPct`, `rampDownCapacityThresholdPct`, `rampDownForceLogoff`, `rampDownWaitTime`, `rampDownMessage`	Ramp-down phase definition
`offPeak*`	`offPeakStartTime`, `offPeakAlgorithm`	Off-peak phase definition

az deployment group create \
  --resource-group rg-avd-prod \
  --template-file src/bicep/scaling.bicep \
  --parameters scalingPlanName='hp-pool01-scaling' \
               hostPoolId='<resource-id>' \
               timeZone='Eastern Standard Time' \
               rampUpStartTime='07:00' \
               rampUpAlgorithm='BreadthFirst' \
               rampUpMinHostsPct=25 \
               rampUpCapacityThresholdPct=60 \
               peakStartTime='09:00' \
               peakAlgorithm='BreadthFirst' \
               rampDownStartTime='17:00' \
               rampDownAlgorithm='DepthFirst' \
               rampDownMinHostsPct=10 \
               rampDownCapacityThresholdPct=90 \
               rampDownForceLogoff=false \
               rampDownWaitTime=30 \
               offPeakStartTime='19:00' \
               offPeakAlgorithm='DepthFirst'

PowerShell (`src/powershell/Deploy-AVDScaling.ps1`)¶

.\src\powershell\Deploy-AVDScaling.ps1 -ConfigPath config/variables.yml

Ansible (`src/ansible/roles/avd-scaling/tasks/main.yml`)¶

Uses azure_rm_resource to create the scaling plan. Tagged as scaling.

ansible-playbook src/ansible/playbooks/site.yml -i inventory.yml --tags scaling

Troubleshooting¶

Symptom	Root Cause	Resolution
Scaling plan shows "Enabled" in portal but VMs never start	The AVD service principal (`9cdead84-...`) doesn't have `Desktop Virtualization Power On/Off Contributor` on the RG	Assign the role to the first-party app on the resource group scope
VMs start but users can't connect	Scaling plan starts VMs but they take 2-5 min to register as available in the host pool	This is normal — Windows boot + AVD agent registration takes time. Increase `minimum_hosts_pct` in ramp-up to have more hosts pre-warmed.
Users are forcefully logged off at 5 PM	`force_logoff: true` and `wait_time_minutes: 0`	Set `wait_time_minutes` to a reasonable value (15-30) and enable `notification_message`
"This scaling plan is not supported for this type of host pool"	Pooled scaling plan assigned to a Personal host pool (or vice versa)	Scaling plan type must match host pool type. Pooled plans use capacity thresholds; Personal plans use session-state-based deallocation. This repo deploys Pooled-type scaling plans only. For Personal host pools, use Start VM on Connect and/or create a Personal-type scaling plan.
Scaling plan diagnostics show evaluation failures	Incorrect `time_zone` value — must be a Windows time zone name, not IANA	Use `[System.TimeZoneInfo]::GetSystemTimeZones()` in PowerShell to list valid names
Hosts oscillate — turning on and off repeatedly	`capacity_threshold_pct` too close to actual utilization — e.g., threshold is 60% and usage bounces between 55-65%	Increase the gap: raise peak threshold to 75% or lower ramp-down threshold to 50%

Scaling Plans¶

Why Scaling Plans Exist¶

Azure Local Considerations¶

What Gets Deployed¶

How the Autoscaler Works¶

Evaluation Logic¶

"Drain mode" Explained¶

Schedule Phases — Deep Dive¶

Ramp-Up Phase¶

Peak Phase¶

Ramp-Down Phase¶

Off-Peak Phase¶

Load Balancing Algorithms — When to Use Which¶

BreadthFirst¶

DepthFirst¶

Prerequisites — Service Principal Role¶

Start VM on Connect — Separate RBAC¶

Configuration — Every Field Explained¶

What Each IaC Tool Deploys — Resource by Resource¶

Terraform (src/terraform/scaling.tf)¶

Bicep (src/bicep/scaling.bicep)¶

PowerShell (src/powershell/Deploy-AVDScaling.ps1)¶

Ansible (src/ansible/roles/avd-scaling/tasks/main.yml)¶

Troubleshooting¶

Terraform (`src/terraform/scaling.tf`)¶

Bicep (`src/bicep/scaling.bicep`)¶

PowerShell (`src/powershell/Deploy-AVDScaling.ps1`)¶

Ansible (`src/ansible/roles/avd-scaling/tasks/main.yml`)¶