Skip to content

Get-S2DHealthStatus

Runs all 11 S2D health checks and returns pass/warn/fail results with severity levels and remediation guidance.


Syntax

Get-S2DHealthStatus [[-CheckName] <string[]>]

Parameters

Parameter Type Description
-CheckName string[] Limit results to one or more specific check names.

Prerequisites

Get-S2DHealthStatus depends on all four primary collectors. It runs them automatically if their data is not already cached:

  • Get-S2DPhysicalDiskInventory — disk health, symmetry, wear, firmware
  • Get-S2DStoragePoolInfo — pool free space, overcommit ratio
  • Get-S2DVolumeMap — volume health, infrastructure volume
  • Get-S2DCacheTierInfo — cache state
  • Get-S2DCapacityWaterfall — reserve status

Output

Returns S2DHealthCheck[] — one object per health check.

Property Type Description
CheckName string Check identifier
Severity string Critical, Warning, or Info
Status string Pass, Warn, or Fail
Details string What was found — values from live cluster data
Remediation string What to do when Status is not Pass

Overall health rollup

After each run, Get-S2DHealthStatus writes an overall health string to $Script:S2DSession.CollectedData['OverallHealth']:

Overall Condition
Critical Any Critical-severity check has Status = Fail
Warning No Critical failures, but at least one check has Status = Warn or Fail
Healthy All checks passed

The 11 Health Checks

1 — ReserveAdequacy

Severity: Critical

Compares actual pool free space against the S2D-recommended rebuild reserve.

Status Condition
Pass ReserveActual ≥ ReserveRecommended
Warn ReserveActual ≥ 50% of ReserveRecommended
Fail ReserveActual < 50% of ReserveRecommended

Reserve formula: min(NodeCount, 4) × LargestCapacityDriveSize

Remediation (Warn/Fail): Free pool space by deleting or shrinking volumes, or add capacity drives to the pool.

Why this is Critical

If the reserve is insufficient and a drive fails, the storage pool cannot complete a full rebuild. A second failure during an in-progress rebuild risks data loss.


2 — DiskSymmetry

Severity: Warning

Checks that all cluster nodes have the same number of physical disks.

Status Condition
Pass All nodes report the same disk count
Warn Disk count differs across one or more nodes

Remediation: Investigate missing or additional disks. S2D requires symmetric disk configurations for balanced performance and correct reserve calculations.


3 — VolumeHealth

Severity: Critical

Checks that all virtual disks are in a healthy operational state.

Status Condition
Pass All volumes have HealthStatus = Healthy and OperationalStatus in {OK, InService, Online}
Fail One or more volumes are degraded, detached, or in error

Remediation: Run Get-VirtualDisk and check cluster event logs for storage health reports.


4 — DiskHealth

Severity: Critical

Checks that all physical disks are in Healthy state.

Status Condition
Pass All physical disks report HealthStatus = Healthy
Fail One or more disks are Warning or Unhealthy

Remediation: Replace failed or degraded disks promptly. Check Get-PhysicalDisk -HasMediaFailure.


5 — NVMeWear

Severity: Warning

Checks that no NVMe drive exceeds 80% wear percentage.

Status Condition
Pass No NVMe drive has WearPercentage > 80
Warn One or more NVMe drives exceed the 80% threshold

Remediation: Plan replacement for high-wear NVMe drives before they reach 100% (end of rated write endurance). Use Get-S2DPhysicalDiskInventory to monitor ongoing wear.

Wear data availability

WearPercentage comes from Get-StorageReliabilityCounter. Some drivers do not expose this counter — if WearPercentage is $null for all drives, the check passes (no evidence of excess wear). Use -Verbose to see which disks have null reliability data.


6 — ThinOvercommit

Severity: Warning

Evaluates maximum potential pool footprint for all thin-provisioned volumes against pool total capacity. Unlike the OvercommitRatio on the pool object (which only reflects data already written), this check projects the worst-case scenario: what happens if every thin volume is written completely full.

Maximum potential footprint = Size × NumberOfDataCopies per thin volume, summed across all thin workload volumes. This is the pool space that would be consumed if all provisioned capacity were actually written under the current resiliency configuration.

Status Condition
Pass No thin volumes present, or maxPotentialFootprint ÷ poolTotal ≤ 80%
Warn maxPotentialFootprint ÷ poolTotal > 80% (approaching danger)
Warn pool.OvercommitRatio > 1.0 (already overcommitted based on written data)
Fail maxPotentialFootprint ÷ poolTotal > 100% (pool exhaustion guaranteed if volumes fill up)

Details field: Lists the number of thin volumes, current pool overcommit ratio, max potential footprint, and the resulting risk percentage.

Remediation (Warn/Fail): Add capacity drives to the pool, reduce provisioned volume sizes, or convert high-risk volumes to fixed provisioning. Use Get-S2DVolumeMap to inspect MaxPotentialFootprint and ThinGrowthHeadroom per volume.

Why this fires before overcommit occurs

The old check fired only when OvercommitRatio > 1.0 — after the pool was already overcommitted. This check fires at 80% and 100% of maximum potential footprint, giving time to act before volumes fill up and pool exhaustion becomes inevitable.


7 — FirmwareConsistency

Severity: Info

Checks that all disks of the same model are running the same firmware version.

Status Condition
Pass No model has more than one firmware version across all nodes
Warn At least one model has mixed firmware versions

Remediation: Update all drives of the same model to the latest firmware using the vendor update tool or Dell/HPE/Lenovo HCI management utilities.


8 — RebuildCapacity

Severity: Critical

Checks whether free pool space is sufficient to absorb the loss of the largest single node's disk capacity.

Status Condition
Pass PoolFreeSpace ≥ LargestNodeDiskCapacity
Warn PoolFreeSpace < LargestNodeDiskCapacity

Remediation: Free pool space by removing or shrinking volumes. Consider adding capacity drives.

Relationship to ReserveAdequacy

ReserveAdequacy checks against the recommended reserve formula (min(NodeCount,4) × largest drive). RebuildCapacity checks against the practical rebuild requirement (largest node's total disk capacity). Both can fail independently.


9 — InfrastructureVolume

Severity: Info

Verifies that the Azure Local infrastructure volume is present and healthy.

Status Condition
Pass One or more infrastructure volumes detected and all are Healthy
Warn Infrastructure volume present but not fully healthy, or no infrastructure volume detected

Remediation: On Azure Local, the infrastructure volume hosts cluster metadata and CSV cache. If missing or degraded, investigate with Get-VirtualDisk. A missing infrastructure volume may indicate a deployment issue.

Windows Server S2D

On Windows Server S2D (not Azure Local), an infrastructure volume is not always present. A Warn status for this check on Windows Server may be expected — use context to determine if action is needed.


10 — CacheTierHealth

Severity: Warning

Checks cache tier health across both physical and software cache configurations.

Status Condition
Pass (all-flash) IsAllFlash = $true and SoftwareCacheEnabled = $true
Pass (hybrid) CacheState = Active
Warn CacheState = Degraded, or cache tier data unavailable

Remediation (Degraded): Check cache disk health with Get-S2DPhysicalDiskInventory. Replace failed cache drives promptly — a degraded cache tier significantly reduces write performance.


11 — ThinReserveRisk

Severity: Critical

Checks whether the maximum uncommitted growth of thin-provisioned volumes would consume the recommended rebuild reserve space. A cluster can survive a drive failure only if the pool has enough free space to complete a rebuild; thin volume growth that erodes that reserve creates a latent risk that normal pool utilisation monitoring does not catch.

Uncommitted growth bytes = max(0, maxPotentialFootprint − currentThinFootprint) — the additional pool space thin volumes could consume if written to full.

Free space after max growth = poolFree − uncommittedGrowthBytes

Status Condition
Pass No thin volumes present, or freeAfterMaxGrowth ≥ reserveRecommended
Warn freeAfterMaxGrowth < reserveRecommended (growth could consume reserve)
Fail freeAfterMaxGrowth < 0 (growth would exhaust the entire pool)

Remediation (Warn/Fail): Add capacity drives to increase pool free space, reduce provisioned volume sizes, or convert high-risk volumes to fixed provisioning. Use Invoke-S2DCapacityWhatIf to model how additional drives would affect the reserve margin.

Relationship to Check 1 (ReserveAdequacy)

ReserveAdequacy compares current pool free space against the recommended reserve. ThinReserveRisk asks a forward-looking question: if all thin volumes fill up, will the reserve still be intact? Both can be passing today while ThinReserveRisk warns about future risk.


Examples

# All checks
Get-S2DHealthStatus | Format-Table CheckName, Severity, Status, Details

# Non-passing checks only
Get-S2DHealthStatus | Where-Object Status -ne 'Pass' | Format-List

# Specific checks
Get-S2DHealthStatus -CheckName 'ReserveAdequacy', 'NVMeWear', 'DiskHealth'

# Overall health rollup
Get-S2DHealthStatus | Out-Null
$Script:S2DSession  # read directly; or use Invoke-S2DCartographer -PassThru

# Remediation report
Get-S2DHealthStatus |
    Where-Object Status -ne 'Pass' |
    Select-Object CheckName, Severity, Status, Remediation |
    Format-List

Troubleshooting

Check results reflect cached data

Get-S2DHealthStatus uses cached collector data when available. If you suspect stale results, clear the cache by disconnecting and reconnecting:

Disconnect-S2DCluster
Connect-S2DCluster -ClusterName "c01-prd-bal" -Credential $cred
Get-S2DHealthStatus

Running specific checks for monitoring

Use -CheckName for lightweight monitoring scripts that only need to evaluate a subset of checks. This does not skip prerequisite data collection — all collectors are still run if their data is not cached.

# Fast reserve-only check
Get-S2DPhysicalDiskInventory | Out-Null
Get-S2DStoragePoolInfo       | Out-Null
Get-S2DCapacityWaterfall     | Out-Null
Get-S2DHealthStatus -CheckName 'ReserveAdequacy'