Skip to content

Troubleshooting

Overview

This page covers common issues encountered during SOFS deployment and operation, organized by deployment phase. For each issue, the symptom, likely cause, and resolution are provided.


Azure Resource Provisioning (Phase 1)

VM Creation Fails

Symptom Cause Resolution
ResourceNotFound for custom location Custom location ID is incorrect or the Azure Local cluster is not registered Verify custom_location_id in variables.yml matches the output of az customlocation show
StoragePathNotFound Storage path ID doesn't exist or is in a different resource group Verify storage_path_id / storage_path_ids in variables.yml
NIC creation fails with IP conflict Static IP already in use on the network Verify IP addresses in variables.yml are available — use Test-Connection or check DHCP leases
Gallery image not found Image name or resource group is wrong Run az azurestackhci galleryimage list to verify available images

Data Disk Attachment Fails

Symptom Cause Resolution
Disk size exceeds available capacity Azure Local volume doesn't have enough free space Check volume capacity: Get-Volume on the Azure Local host. Reduce data_disk_size_gb or expand the host volume
Dynamic provisioning not reducing usage Disks are dynamically provisioned but the host volume must reserve the full allocation This is expected — the host volume size must accommodate the maximum allocation even though actual usage grows dynamically

Anti-Affinity (Phases 3–4)

VMs on Same Physical Node

Symptom Cause Resolution
Two or more SOFS VMs on the same Azure Local node Anti-affinity rule not created or not enforced Create/verify the rule: New-ClusterAffinityRule -Name "SOFS-AntiAffinity" -RuleType AntiAffinity -Groups "sofs-01","sofs-02","sofs-03" -Cluster "azl-cluster"
Rule exists but VMs not separated Rule was created after VMs were placed Live migrate VMs manually: Move-ClusterVirtualMachineRole
Get-ClusterAffinityRule not found Azure Local OS version doesn't include this cmdlet Use legacy Set-VMAntiAffinityClassNames on each VM instead

Failover Clustering (Phase 5)

Cluster Creation Fails

Symptom Cause Resolution
DNS registration fails Cluster CNO can't register DNS A record Pre-create the DNS A record for the cluster name, or grant the computer account dynamic DNS update permissions
Access denied creating cluster Service account lacks permissions to create AD computer objects Pre-stage the CNO in the target OU and grant the service account Full Control over it
Cluster validation warnings about network Only one network detected Expected in single-NIC deployments — proceed if the single network is sufficient for your environment

Cloud Witness (Phase 6)

Quorum Issues

Symptom Cause Resolution
Cloud witness unreachable Storage account firewall blocking access, or incorrect key Verify the storage account allows access from the SOFS VMs' IP range. Regenerate and re-apply the access key
Quorum lost with one node down Cloud witness not configured, falling back to node majority only Run Set-ClusterQuorum -CloudWitness -AccountName "<name>" -AccessKey "<key>"
Set-ClusterQuorum fails Storage account name or key is wrong Verify with az storage account keys list

Storage Spaces Direct (Phase 7)

S2D Enable Fails

Symptom Cause Resolution
Enable-ClusterS2D fails with "no eligible disks" Data disks not visible inside VMs Verify disks are attached: Get-Disk on each VM. Check that disks are Online and not initialized
S2D requires at minimum 3 physical disks Fewer than 3 data disks total across the cluster Ensure each VM has at least 1 data disk (3 VMs × 1 disk minimum). Default is 4 per VM
Pool creation fails Virtual disk bus type not supported Ensure disks are SCSI (not IDE) — Azure Local Arc VMs use SCSI by default

S2D Tuning

Symptom Cause Resolution
Performance degradation in nested environment Default S2D tuning not optimized for guest VMs Apply guest tuning: Set-StoragePool -AutoRebalanceFrequency 1440, Set-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Services\spaceport\Parameters" -Name HwTimeout -Value 0x00002710 -Force

Volume Creation Fails

Symptom Cause Resolution
Not enough capacity for requested volume size Volume size (× data copies) exceeds pool capacity Reduce volume size or add more data disks. Two-way mirror: usable = pool_size / 2. Three-way: usable = pool_size / 3
Wrong number of data copies Didn't specify -NumberOfDataCopies 2 on a 3-node cluster Default is three-way mirror on 3+ nodes. Explicitly set -NumberOfDataCopies 2 for two-way mirror

SOFS Role (Phase 8)

SOFS Access Point Creation Fails

Symptom Cause Resolution
Add-ClusterScaleOutFileServerRole fails with AD error Cluster CNO lacks permission to create computer objects Pre-stage the SOFS access point computer object in AD and grant the cluster CNO Full Control
DNS A record not created Dynamic DNS updates restricted Manually create the A record for the SOFS access point name
Share creation fails — scope name not found SOFS role not yet online Verify: Get-ClusterGroup \| Where-Object { $_.GroupType -eq "ScaleOutFileServer" } — ensure it shows Online

SMB Shares and Permissions (Phase 9)

Share Access Issues

Symptom Cause Resolution
Test-Path \\SOFSName\Share returns $false Share not created, DNS not resolving, or firewall blocking SMB Check share exists (Get-SmbShare), verify DNS resolution (Resolve-DnsName), verify SMB port 445 is open
Users can see other users' folders Access-Based Enumeration not enabled Recreate share with -FolderEnumerationMode AccessBased
FSLogix "access denied" at first logon NTFS permissions incorrect — Domain Users missing Modify on share root Re-run Set-FSLogixNTFS — see Permissions
Profile loads read-only SMB ChangeAccess not granted to Domain Users Verify: Get-SmbShareAccess -Name "<share>" — Domain Users needs Change

FSLogix Profile Issues

Profile Mount Failures

Symptom Cause Resolution
FSLogix event 25 — "Failed to attach VHD" AV scanning VHDX during mount Apply antivirus exclusions on session hosts
FSLogix event 33 — "VHDX full" Profile container exceeded SizeInMBs Increase SizeInMBs in registry or expand existing VHDX: Resize-VHD
Profile loads local instead of FSLogix Enabled not set to 1 or VHDLocations path is wrong Verify registry: Get-ItemProperty HKLM:\SOFTWARE\FSLogix\Profiles
Slow logons (>30 seconds for profile load) AV scanning, network latency, or oversized profiles Apply AV exclusions, verify session hosts are on the same network as SOFS, review profile sizes

Cloud Cache Issues

Symptom Cause Resolution
Cloud Cache not syncing to Azure CCDLocations connection string malformed Verify the pipe-delimited Key Vault reference format: "\|fslogix/<key-name>\|"
Both VHDLocations and CCDLocations set Mutually exclusive — CCDLocations silently overrides Remove VHDLocations when using Cloud Cache

Network Issues

SMB Connectivity

Symptom Cause Resolution
Test-Path works intermittently DNS round-robin or SOFS role failover in progress Wait 30 seconds and retry — SMB3 transparent failover reconnects automatically
High latency to SOFS Session hosts and SOFS VMs on different VLANs/subnets Place all machines on the same compute network for optimal performance
SMB connections dropping SMB2 being used instead of SMB3 Verify: Get-SmbConnection — ensure dialect is 3.x. SMB3 is required for CA

Capacity Issues

Storage Full

Symptom Cause Resolution
S2D volume full Profile growth exceeded planned capacity Expand host volumes, then expand VM data disks, then expand S2D volume: Resize-VirtualDisk + Resize-Partition
VHDX files larger than expected Users storing large files in profile Review profile content, consider ODFC containers (Triple layout) to separate Office data
Dynamic disks consuming full allocation All profile space has been written to at some point Dynamic provisioning only helps for initial deployment — once data is written, space is consumed. Plan for steady-state, not day-one

Antivirus Issues

Symptom Cause Resolution
Profile corruption after AV update AV product regained control of excluded paths after update Re-verify exclusions after every AV product update: Get-MpPreference
S2D performance degraded AV scanning C:\ClusterStorage Apply SOFS VM exclusions — see Antivirus Exclusions
Session host CPU spike during logon/logoff AV scanning VHDX mount/dismount Apply session host exclusions for frxsvc.exe, frxdrv.sys, and VHDX extensions

Diagnostic Commands

Quick reference for diagnostic commands run from a management workstation:

# Cluster health
Get-ClusterNode -Cluster "sofs-cluster" | Select-Object Name, State
Get-ClusterGroup -Cluster "sofs-cluster" | Select-Object Name, State, OwnerNode

# S2D health
Get-StoragePool -CimSession "sofs-cluster" | Select-Object FriendlyName, HealthStatus
Get-VirtualDisk -CimSession "sofs-cluster" | Select-Object FriendlyName, HealthStatus, OperationalStatus
Get-PhysicalDisk -CimSession "sofs-cluster" | Select-Object FriendlyName, HealthStatus, MediaType

# SMB shares
Get-SmbShare -CimSession "sofs-01" | Where-Object { $_.ScopeName -ne "*" }
Get-SmbShareAccess -Name "FSLogix" -CimSession "sofs-01"

# Anti-affinity
Get-ClusterGroup -Cluster "azl-cluster" | Where-Object { $_.Name -like "*sofs*" } | Select-Object Name, OwnerNode

# FSLogix (on session host)
Get-ItemProperty HKLM:\SOFTWARE\FSLogix\Profiles
Get-WinEvent -LogName "Microsoft-FSLogix-Apps/Operational" -MaxEvents 20