Scaling Plans¶
This guide explains exactly how AVD scaling plans work — what Azure resource gets deployed, how the autoscaler evaluates capacity every 15 minutes, what happens in each of the four schedule phases, the difference between the two load-balancing algorithms, and how each IaC tool implements it.
Why Scaling Plans Exist¶
AVD session hosts are VMs. VMs cost money when they're running, even if nobody is using them. A scaling plan tells the AVD service: "At 7 AM, start turning on hosts. At 9 AM, make sure there's plenty of capacity. At 5 PM, start draining users and shutting down hosts. Overnight, keep just a handful running."
Without a scaling plan, you either leave all hosts running 24/7 (expensive), or manually turn them on and off (error-prone, somebody forgets on Friday).

Open the draw.io source for an editable version.
Pooled vs. Personal
Pooled host pools support full capacity-based autoscaling (this guide's focus). Personal host pools now also support scaling plans, but with a different model — personal scaling plans deallocate VMs based on user session state (signed out / disconnected), not capacity thresholds. This repo's IaC deploys scaling plans only for Pooled pools: count = var.scaling_enabled && var.host_pool_type == "Pooled" ? 1 : 0.
Azure Local Considerations¶
Scaling plans work on Azure Local session hosts — Microsoft explicitly supports both power management autoscaling and Start VM on Connect for session hosts on Azure and Azure Local. However, running on-premises introduces important differences compared to Azure-hosted VMs:
Azure Local-Specific Caveats
- Fixed hardware capacity — Unlike Azure, you cannot burst beyond your physical cluster's compute capacity. Design
minimum_hosts_pctand capacity thresholds conservatively. If all physical cores are committed, the autoscaler cannot power on additional VMs. - Azure connectivity required — The AVD autoscaler is a cloud-hosted service. It sends power commands through Azure Arc to the on-premises cluster. If the Azure Local cluster loses connectivity to Azure, scaling actions will not execute until the connection is restored.
- Power action latency — Starting an Arc-enabled VM on Azure Local involves Azure API → Arc agent → Azure Local cluster → Hyper-V. This may add a few seconds of latency compared to starting an Azure VM directly. Factor this into ramp-up
minimum_hosts_pctto pre-warm hosts before peak. - Dynamic autoscaling (preview) is not confirmed for Azure Local — Dynamic autoscaling (which creates/deletes VMs, not just power on/off) is currently only available in Azure. Azure Local VM provisioning requires Arc VM creation with specific logical networks and on-premises images, which the dynamic autoscaler does not support.
- Host pool isolation — Azure and Azure Local session hosts cannot be mixed in the same host pool. Create separate host pools per environment.
| Feature | Azure VMs | Azure Local (Arc VMs) |
|---|---|---|
| Power management autoscale | Supported | Supported |
| Start VM on Connect | Supported | Supported |
| Dynamic autoscale (create/delete VMs) | Preview | Not supported |
| Resource type | Microsoft.Compute/virtualMachines |
Microsoft.HybridCompute/machines |
| RBAC role | DVU Power On/Off Contributor | Same role, same subscription-level assignment |
What Gets Deployed¶
| Azure Resource Type | Resource Name | What It Is |
|---|---|---|
Microsoft.DesktopVirtualization/scalingPlans |
${host_pool_name}-scaling |
The scaling plan itself — contains schedule definitions, algorithm settings, capacity thresholds, and the link to the host pool. |
Microsoft.DesktopVirtualization/scalingPlans/pooledSchedules |
One per schedule entry | Child resource of the scaling plan — defines the four phase times, algorithms, and thresholds for a specific day pattern (e.g., weekdays). |
Microsoft.Authorization/roleAssignments |
Auto-created | The AVD service principal (first-party app 9cdead84-a844-4324-93f2-b2e6bb768d07) must have Desktop Virtualization Power On/Off Contributor on the RG so it can actually start and stop VMs. |
How the Autoscaler Works¶
The AVD autoscaler is a Microsoft-managed service — you don't deploy it. Once you create a scaling plan and associate it with a host pool, the service evaluates every 15 minutes (approximately) whether to turn hosts on or off.
Evaluation Logic¶
Every 15 minutes, the autoscaler:
- Reads the current phase from the schedule (based on the current time in the configured time zone)
- Counts total sessions across all running session hosts
- Calculates used capacity percentage:
(total sessions / total max sessions across running hosts) × 100 - Compares to the capacity threshold for the current phase
- If used capacity > threshold: turns on additional hosts (one at a time, wait for it to report healthy, then re-evaluate)
- If used capacity < threshold (and in ramp-down or off-peak): drains a host and powers it off
"Drain mode" Explained¶
When the autoscaler decides to power off a host:
- It sets the host to Drain mode — no new sessions are assigned to it
- Existing users continue working — they're NOT kicked off immediately
- If
force_logoffistrueandwait_time_minuteshas elapsed, users get the notification message and are forcefully signed out - If
force_logoffisfalse, the autoscaler waits for all users to sign out naturally — the host stays running until it's empty - Once all sessions are gone, the VM is deallocated (stopped and not billed for compute)
Schedule Phases — Deep Dive¶
Ramp-Up Phase¶
Purpose: Get hosts ready before the workday starts so users don't wait for VMs to boot.
How it works:
- The autoscaler starts at the configured
ramp_up.start_time(e.g., 07:00) - It calculates how many hosts are needed to meet
minimum_hosts_pct— example: if you have 20 session hosts andminimum_hosts_pct: 25, at least 5 hosts must be running at 7:00 AM - It powers on hosts one by one until the minimum is met
- As early-bird users log in, if the
capacity_threshold_pctis exceeded, additional hosts are powered on - Load-balancing algorithm: typically
BreadthFirst(spread users across hosts for best experience while they trickle in)
Configuration fields:
| Field | Type | What It Controls | Example |
|---|---|---|---|
start_time |
String (HH:MM) | When this phase begins | "07:00" |
algorithm |
BreadthFirst or DepthFirst |
How sessions are assigned to running hosts | BreadthFirst |
minimum_hosts_pct |
Integer (0-100) | Percentage of total hosts that must be running during this phase | 25 |
capacity_threshold_pct |
Integer (0-100) | When used capacity hits this %, turn on another host | 60 |
Peak Phase¶
Purpose: Maximum availability. All needed hosts are running.
How it works:
- Starts at
peak.start_time(e.g., 09:00) - No minimum host percentage — the autoscaler only turns on more hosts if
capacity_threshold_pctis exceeded - No hosts are turned off during peak, even if utilization drops
- Load-balancing algorithm: typically
BreadthFirstfor best user experience
Configuration fields:
| Field | Type | What It Controls | Example |
|---|---|---|---|
start_time |
String (HH:MM) | When peak begins | "09:00" |
algorithm |
BreadthFirst or DepthFirst |
Session distribution strategy | BreadthFirst |
Ramp-Down Phase¶
Purpose: Gracefully reduce capacity as users leave for the day.
How it works:
- Starts at
ramp_down.start_time(e.g., 17:00) - The autoscaler begins draining hosts that are below the
capacity_threshold_pct - Sessions are consolidated onto fewer hosts (using
DepthFirstto pack users tightly) - Once a host is empty, it's powered off
- If
force_logoff: true, users who haven't signed out afterwait_time_minuteswill seenotification_messageand be forcefully logged off after the timer expires - The autoscaler won't go below
minimum_hosts_pct— ensures some capacity stays online while late workers finish
Configuration fields:
| Field | Type | What It Controls | Example |
|---|---|---|---|
start_time |
String (HH:MM) | When ramp-down begins | "17:00" |
algorithm |
BreadthFirst or DepthFirst |
Usually DepthFirst to consolidate | DepthFirst |
minimum_hosts_pct |
Integer (0-100) | Floor — don't go below this many hosts | 10 |
capacity_threshold_pct |
Integer (0-100) | Drain threshold — higher = more aggressive draining | 90 |
force_logoff |
Boolean | Whether to forcefully sign out users after the wait time | false |
wait_time_minutes |
Integer | Minutes to wait before forcing logoff (0 = immediate) | 30 |
notification_message |
String | Message shown to users before force logoff | "Your session will be logged off in 30 minutes." |
Off-Peak Phase¶
Purpose: Minimum capacity overnight. Only keep a skeleton crew of hosts for after-hours users.
How it works:
- Starts at
off_peak.start_time(e.g., 19:00) - Continues draining and powering off hosts until only
minimum_hosts_pctremain - Uses
DepthFirstto consolidate any remaining sessions - If a late-night user logs in and capacity threshold is hit, the autoscaler will power on one host
Configuration fields:
| Field | Type | What It Controls | Example |
|---|---|---|---|
start_time |
String (HH:MM) | When off-peak begins | "19:00" |
algorithm |
BreadthFirst or DepthFirst |
Session distribution | DepthFirst |
Load Balancing Algorithms — When to Use Which¶
BreadthFirst¶
What it does: Distributes sessions evenly across all available hosts. If Host A has 5 sessions and Host B has 3 sessions, the next user goes to Host B.
When to use: During ramp-up and peak. Users get more CPU/memory per user because the load is spread. Better user experience but more hosts running.
DepthFirst¶
What it does: Fills each host to capacity before using the next one. If Host A has room, the next user goes to Host A, even if Host B is empty.
When to use: During ramp-down and off-peak. Consolidates users onto fewer hosts so the empty hosts can be powered off. Worse per-user experience (more users per VM) but significant cost savings.
Prerequisites — Service Principal Role¶
The AVD scaling service runs under a Microsoft first-party application (not your own service principal). It needs RBAC permission to start/stop VMs in your resource group.
Service Principal Details:
| Property | Value |
|---|---|
| Application Name | Windows Virtual Desktop |
| Application ID (Client ID) | 9cdead84-a844-4324-93f2-b2e6bb768d07 |
| Required Role | Desktop Virtualization Power On/Off Contributor |
| Scope | Resource Group containing session host VMs |
This role assignment is not created by the scaling plan itself — your identity deployment (or a separate step) must create it. If this role is missing, the scaling plan will evaluate correctly but fail to actually start or stop VMs, and you'll see errors in the scaling plan diagnostics log.
Start VM on Connect — Separate RBAC¶
Start VM on Connect is a complementary feature (configured on the host pool, not the scaling plan) that powers on a session host when a user tries to connect and no running host is available. It is explicitly supported on both Azure and Azure Local.
| Property | Value |
|---|---|
| Feature | Start VM on Connect (host pool property) |
| Required Role | Desktop Virtualization Power On Contributor |
| Scope | Subscription containing session host VMs |
| Config field | control_plane.start_vm_on_connect: true |
Tip
Use Start VM on Connect together with scaling plans. During ramp-up, the scaling plan pre-warms hosts. During off-peak, if the scaling plan has powered off most hosts, Start VM on Connect ensures an after-hours user can still trigger a host to start automatically.
Note
The RBAC roles are different: scaling plans need Power On/Off Contributor (can start AND stop), while Start VM on Connect needs Power On Contributor (can only start). Both must be assigned at the subscription level for Azure Local to work correctly.
Configuration — Every Field Explained¶
scaling:
enabled: true # Master toggle. If false, no scaling plan is deployed.
# This repo deploys Pooled-type scaling plans only.
time_zone: "Eastern Standard Time" # Windows time zone name (NOT IANA/Linux format).
# The autoscaler evaluates phase start times in this zone.
# Common values: "Eastern Standard Time", "Pacific Standard Time",
# "UTC", "W. Europe Standard Time", "AUS Eastern Standard Time"
schedules:
- name: weekday-schedule # Display name in the Azure portal. Use descriptive names.
days_of_week: # Which days this schedule applies to.
- Monday # You can create multiple schedules — e.g., a weekday schedule
- Tuesday # and a weekend schedule with different thresholds.
- Wednesday
- Thursday
- Friday
ramp_up:
start_time: "07:00" # Phase start. Format: HH:MM (24-hour, in the configured time zone).
algorithm: BreadthFirst # Spread users across hosts for best experience during ramp.
minimum_hosts_pct: 25 # Pre-warm 25% of hosts before users arrive.
capacity_threshold_pct: 60 # When 60% of running host capacity is used, power on another host.
peak:
start_time: "09:00" # Workday begins. All hosts are already warm from ramp-up.
algorithm: BreadthFirst # Keep spreading for performance.
ramp_down:
start_time: "17:00" # End of day — start consolidating.
algorithm: DepthFirst # Pack users tightly so we can power off empty hosts.
minimum_hosts_pct: 10 # Don't go below 10% of hosts during ramp-down.
capacity_threshold_pct: 90 # Only add a host if 90% of existing capacity is used (aggressive drain).
force_logoff: false # Don't forcefully sign out users.
wait_time_minutes: 30 # (Only applies if force_logoff is true) Wait 30 min before force logoff.
notification_message: "Your session will be logged off in 30 minutes."
off_peak:
start_time: "19:00" # Overnight — skeleton crew.
algorithm: DepthFirst # Consolidate remaining sessions.
What Each IaC Tool Deploys — Resource by Resource¶
Terraform (src/terraform/scaling.tf)¶
| Terraform Resource | Azure Resource Created | What It Does |
|---|---|---|
azurerm_virtual_desktop_scaling_plan.scaling[0] |
Microsoft.DesktopVirtualization/scalingPlans |
Creates the scaling plan. Conditionally deployed: count = var.scaling_enabled && var.host_pool_type == "Pooled" ? 1 : 0. Links to the host pool via host_pool block. |
(inline) dynamic "schedule" block |
pooledSchedules child resource |
Iterates over var.scaling_schedules list. Each entry creates one schedule with all four phases. |
Terraform variable type for schedules:
variable "scaling_schedules" {
type = list(object({
name = string
days_of_week = list(string)
ramp_up_start_time = string
ramp_up_load_balancing_algorithm = string
ramp_up_minimum_hosts_percent = number
ramp_up_capacity_threshold_percent = number
peak_start_time = string
peak_load_balancing_algorithm = string
ramp_down_start_time = string
ramp_down_load_balancing_algorithm = string
ramp_down_minimum_hosts_percent = number
ramp_down_capacity_threshold_percent = number
ramp_down_force_logoff_users = bool
ramp_down_wait_time_minutes = number
ramp_down_notification_message = string
off_peak_start_time = string
off_peak_load_balancing_algorithm = string
}))
}
Bicep (src/bicep/scaling.bicep)¶
The Bicep module creates the same resource using individual parameters rather than a complex object. Each phase gets its own parameter set:
| Parameter Pattern | Example | Maps To |
|---|---|---|
rampUp* |
rampUpStartTime, rampUpAlgorithm, rampUpMinHostsPct, rampUpCapacityThresholdPct |
Ramp-up phase definition |
peak* |
peakStartTime, peakAlgorithm |
Peak phase definition |
rampDown* |
rampDownStartTime, rampDownAlgorithm, rampDownMinHostsPct, rampDownCapacityThresholdPct, rampDownForceLogoff, rampDownWaitTime, rampDownMessage |
Ramp-down phase definition |
offPeak* |
offPeakStartTime, offPeakAlgorithm |
Off-peak phase definition |
az deployment group create \
--resource-group rg-avd-prod \
--template-file src/bicep/scaling.bicep \
--parameters scalingPlanName='hp-pool01-scaling' \
hostPoolId='<resource-id>' \
timeZone='Eastern Standard Time' \
rampUpStartTime='07:00' \
rampUpAlgorithm='BreadthFirst' \
rampUpMinHostsPct=25 \
rampUpCapacityThresholdPct=60 \
peakStartTime='09:00' \
peakAlgorithm='BreadthFirst' \
rampDownStartTime='17:00' \
rampDownAlgorithm='DepthFirst' \
rampDownMinHostsPct=10 \
rampDownCapacityThresholdPct=90 \
rampDownForceLogoff=false \
rampDownWaitTime=30 \
offPeakStartTime='19:00' \
offPeakAlgorithm='DepthFirst'
PowerShell (src/powershell/Deploy-AVDScaling.ps1)¶
Ansible (src/ansible/roles/avd-scaling/tasks/main.yml)¶
Uses azure_rm_resource to create the scaling plan. Tagged as scaling.
Troubleshooting¶
| Symptom | Root Cause | Resolution |
|---|---|---|
| Scaling plan shows "Enabled" in portal but VMs never start | The AVD service principal (9cdead84-...) doesn't have Desktop Virtualization Power On/Off Contributor on the RG |
Assign the role to the first-party app on the resource group scope |
| VMs start but users can't connect | Scaling plan starts VMs but they take 2-5 min to register as available in the host pool | This is normal — Windows boot + AVD agent registration takes time. Increase minimum_hosts_pct in ramp-up to have more hosts pre-warmed. |
| Users are forcefully logged off at 5 PM | force_logoff: true and wait_time_minutes: 0 |
Set wait_time_minutes to a reasonable value (15-30) and enable notification_message |
| "This scaling plan is not supported for this type of host pool" | Pooled scaling plan assigned to a Personal host pool (or vice versa) | Scaling plan type must match host pool type. Pooled plans use capacity thresholds; Personal plans use session-state-based deallocation. This repo deploys Pooled-type scaling plans only. For Personal host pools, use Start VM on Connect and/or create a Personal-type scaling plan. |
| Scaling plan diagnostics show evaluation failures | Incorrect time_zone value — must be a Windows time zone name, not IANA |
Use [System.TimeZoneInfo]::GetSystemTimeZones() in PowerShell to list valid names |
| Hosts oscillate — turning on and off repeatedly | capacity_threshold_pct too close to actual utilization — e.g., threshold is 60% and usage bounces between 55-65% |
Increase the gap: raise peak threshold to 75% or lower ramp-down threshold to 50% |