Task 03: Network & RDMA Validation
DOCUMENT CATEGORY: Runbook
SCOPE: Network and RDMA validation
PURPOSE: Validate network stack, RDMA, and DCB configuration
MASTER REFERENCE: Microsoft Learn - Validate-DCB
Status: Active
Overview
This step validates the complete network stack including RDMA configuration, DCB (Data Center Bridging) settings, VLAN connectivity, and core network services. A successful network validation is critical for storage performance and cluster communication.
Prerequisites
- Infrastructure health validation completed (Step 1)
- Administrative access to all cluster nodes
- Physical network switches configured for RDMA
- VLAN IDs documented and configured
Report Output
All validation results are saved to:
\\<ClusterName>\ClusterStorage$\Collect\validation-reports\03-network-rdma-validation-YYYYMMDD.txt
Variables from variables.yml
| Variable Path | Type | Description |
|---|---|---|
networking.management.vlan_id | Integer | Management VLAN ID for connectivity tests |
networking.management.subnet | String | Management subnet CIDR for IP validation |
networking.management.gateway | String | Default gateway for ping tests |
networking.management.dns_servers | Array | DNS server IPs for resolution tests |
compute.nodes[].name | String | Node hostnames for per-node RDMA/DCB validation |
Part 1: Initialize Validation Environment
1.1 Create Report Directory
# Run from any cluster node
$ClusterName = (Get-Cluster).Name
$DateStamp = Get-Date -Format "yyyyMMdd"
$ReportPath = "C:\ClusterStorage\Collect\validation-reports"
$ReportFile = "$ReportPath\03-network-rdma-validation-$DateStamp.txt"
# Create directory if not exists
if (-not (Test-Path $ReportPath)) {
New-Item -Path $ReportPath -ItemType Directory -Force
}
# Initialize report
$ReportHeader = @"
================================================================================
NETWORK & RDMA VALIDATION REPORT
================================================================================
Cluster: $ClusterName
Date: $(Get-Date -Format "yyyy-MM-dd HH:mm:ss")
Generated By: $(whoami)
================================================================================
"@
$ReportHeader | Out-File -FilePath $ReportFile -Encoding UTF8
1.2 Install Required Modules
# Install Validate-DCB module if not present
if (-not (Get-Module -ListAvailable -Name Validate-DCB)) {
Install-Module -Name Validate-DCB -Force -Scope AllUsers
}
Import-Module Validate-DCB
# Verify Test-NetStack is available (built into Windows Server 2022)
Get-Command Test-NetStack -ErrorAction SilentlyContinue
Part 2: RDMA Adapter Validation
2.1 Verify RDMA Adapters
# Check RDMA adapter status on all nodes
$Nodes = (Get-ClusterNode).Name
$RdmaResults = foreach ($Node in $Nodes) {
Invoke-Command -ComputerName $Node -ScriptBlock {
Get-NetAdapterRdma | Select-Object @{N='Node';E={$env:COMPUTERNAME}},
Name, InterfaceDescription, Enabled,
@{N='OperationalState';E={if($_.Enabled){"Operational"}else{"Disabled"}}}
}
}
# Display and log results
$RdmaResults | Format-Table -AutoSize
$RdmaResults | Format-Table -AutoSize | Out-String | Add-Content $ReportFile
# Verify all RDMA adapters are enabled
$DisabledRdma = $RdmaResults | Where-Object { -not $_.Enabled }
if ($DisabledRdma) {
"WARNING: RDMA disabled on adapters:" | Add-Content $ReportFile
$DisabledRdma | Format-Table | Out-String | Add-Content $ReportFile
}
2.2 Verify RDMA Mode (RoCE v2 vs iWARP)
# Check RDMA protocol type
$RdmaProtocol = foreach ($Node in $Nodes) {
Invoke-Command -ComputerName $Node -ScriptBlock {
Get-NetAdapterAdvancedProperty -Name "Storage*" -RegistryKeyword "*NetworkDirect*" -ErrorAction SilentlyContinue |
Select-Object @{N='Node';E={$env:COMPUTERNAME}}, Name, RegistryKeyword, RegistryValue
}
}
"RDMA Protocol Configuration:" | Add-Content $ReportFile
$RdmaProtocol | Format-Table -AutoSize | Out-String | Add-Content $ReportFile
2.3 SMB Direct Status
# Verify SMB Direct is enabled
$SmbDirect = foreach ($Node in $Nodes) {
Invoke-Command -ComputerName $Node -ScriptBlock {
Get-SmbClientNetworkInterface | Where-Object RdmaCapable -eq $true |
Select-Object @{N='Node';E={$env:COMPUTERNAME}}, InterfaceIndex,
FriendlyName, RdmaCapable
}
}
"`nSMB Direct (RDMA) Capable Interfaces:" | Add-Content $ReportFile
$SmbDirect | Format-Table -AutoSize | Out-String | Add-Content $ReportFile
Part 3: DCB Validation
3.1 Run Validate-DCB
# Run comprehensive DCB validation
# This validates Priority Flow Control (PFC) and Enhanced Transmission Selection (ETS)
$DcbResults = Validate-DCB -Verbose
# Log results
"`n" + "="*80 | Add-Content $ReportFile
"DCB VALIDATION RESULTS" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile
$DcbResults | Out-String | Add-Content $ReportFile
3.2 Verify PFC Configuration
# Check Priority Flow Control settings
$PfcSettings = foreach ($Node in $Nodes) {
Invoke-Command -ComputerName $Node -ScriptBlock {
Get-NetQosDcbxSetting | Select-Object @{N='Node';E={$env:COMPUTERNAME}},
InterfaceAlias, Willing
Get-NetQosFlowControl | Select-Object @{N='Node';E={$env:COMPUTERNAME}},
Priority, Enabled
}
}
"`nPriority Flow Control Settings:" | Add-Content $ReportFile
$PfcSettings | Format-Table -AutoSize | Out-String | Add-Content $ReportFile
3.3 Verify ETS Configuration
# Check Enhanced Transmission Selection
$EtsSettings = foreach ($Node in $Nodes) {
Invoke-Command -ComputerName $Node -ScriptBlock {
Get-NetQosTrafficClass | Select-Object @{N='Node';E={$env:COMPUTERNAME}},
Name, Priority, BandwidthPercentage, Algorithm
}
}
"`nETS Traffic Class Settings:" | Add-Content $ReportFile
$EtsSettings | Format-Table -AutoSize | Out-String | Add-Content $ReportFile
Part 4: Test-NetStack Validation
4.1 Run Network Stack Tests
# Test-NetStack validates the entire network stack
# Run between storage network adapters
# Get storage adapter IPs
$StorageAdapters = Get-NetAdapter -Name "Storage*" | Get-NetIPAddress -AddressFamily IPv4
"`n" + "="*80 | Add-Content $ReportFile
"NETWORK STACK TEST RESULTS (Test-NetStack)" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile
# Run Test-NetStack between first two nodes
$Node1 = $Nodes[0]
$Node2 = $Nodes[1]
# Get storage IPs for each node
$Node1StorageIP = (Invoke-Command -ComputerName $Node1 -ScriptBlock {
(Get-NetIPAddress -InterfaceAlias "Storage*" -AddressFamily IPv4)[0].IPAddress
})
$Node2StorageIP = (Invoke-Command -ComputerName $Node2 -ScriptBlock {
(Get-NetIPAddress -InterfaceAlias "Storage*" -AddressFamily IPv4)[0].IPAddress
})
# Run Test-NetStack (requires Windows Server 2022+)
Invoke-Command -ComputerName $Node1 -ScriptBlock {
param($TargetIP)
Test-NetStack -Target $TargetIP -EnableFirewallRules
} -ArgumentList $Node2StorageIP | Out-String | Add-Content $ReportFile
4.2 RDMA Traffic Test
# Test RDMA connectivity between nodes
"`nRDMA Traffic Test:" | Add-Content $ReportFile
Invoke-Command -ComputerName $Node1 -ScriptBlock {
param($TargetIP)
# Test RDMA using NTttcp with RDMA mode
Test-NetStack -Target $TargetIP -EnableRDMA
} -ArgumentList $Node2StorageIP | Out-String | Add-Content $ReportFile
Part 5: VLAN Connectivity Validation
5.1 Verify VLAN Configuration
# Check VLAN assignments on virtual adapters
$VlanConfig = foreach ($Node in $Nodes) {
Invoke-Command -ComputerName $Node -ScriptBlock {
Get-VMNetworkAdapterVlan -ManagementOS |
Select-Object @{N='Node';E={$env:COMPUTERNAME}},
ParentAdapter, AccessVlanId, NativeVlanId, OperationMode
}
}
"`n" + "="*80 | Add-Content $ReportFile
"VLAN CONFIGURATION" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile
$VlanConfig | Format-Table -AutoSize | Out-String | Add-Content $ReportFile
5.2 Test VLAN Connectivity
# Ping test across VLANs
$VlanTests = @(
@{Name="Management"; VLAN=711; TestIP="10.X.X.1"},
@{Name="Storage1"; VLAN=712; TestIP="10.X.X.1"},
@{Name="Storage2"; VLAN=713; TestIP="10.X.X.1"},
@{Name="VM Traffic"; VLAN=714; TestIP="10.X.X.1"}
)
"`nVLAN Connectivity Tests:" | Add-Content $ReportFile
foreach ($Vlan in $VlanTests) {
# Replace TestIP with actual gateway/target for each VLAN
$Result = Test-Connection -ComputerName $Vlan.TestIP -Count 2 -Quiet -ErrorAction SilentlyContinue
$Status = if ($Result) { "PASS" } else { "FAIL" }
"VLAN $($Vlan.VLAN) ($($Vlan.Name)): $Status" | Add-Content $ReportFile
}
Part 6: Core Network Services Validation
6.1 DNS Resolution
"`n" + "="*80 | Add-Content $ReportFile
"CORE NETWORK SERVICES" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile
# Test DNS resolution on all nodes
$DnsTests = @(
"management.azure.com",
"login.microsoftonline.com",
"$ClusterName",
"dc01.domain.local" # Replace with actual DC
)
"`nDNS Resolution Tests:" | Add-Content $ReportFile
foreach ($Target in $DnsTests) {
$Result = Resolve-DnsName -Name $Target -ErrorAction SilentlyContinue
$Status = if ($Result) { "PASS - $($Result[0].IPAddress)" } else { "FAIL" }
"$Target : $Status" | Add-Content $ReportFile
}
6.2 NTP Synchronization
# Verify time sync across all nodes
$TimeSync = foreach ($Node in $Nodes) {
Invoke-Command -ComputerName $Node -ScriptBlock {
$w32tm = w32tm /query /status 2>&1
[PSCustomObject]@{
Node = $env:COMPUTERNAME
Source = ($w32tm | Select-String "Source:").ToString().Split(":")[1].Trim()
Stratum = ($w32tm | Select-String "Stratum:").ToString().Split(":")[1].Trim()
LastSync = ($w32tm | Select-String "Last Successful").ToString().Split(": ")[1]
}
}
}
"`nNTP Time Synchronization:" | Add-Content $ReportFile
$TimeSync | Format-Table -AutoSize | Out-String | Add-Content $ReportFile
# Check for time skew between nodes
$TimeDiff = foreach ($Node in $Nodes) {
Invoke-Command -ComputerName $Node -ScriptBlock {
[PSCustomObject]@{
Node = $env:COMPUTERNAME
CurrentTime = Get-Date -Format "yyyy-MM-dd HH:mm:ss.fff"
}
}
}
"`nNode Time Comparison:" | Add-Content $ReportFile
$TimeDiff | Format-Table | Out-String | Add-Content $ReportFile
6.3 Azure Connectivity
# Test connectivity to Azure endpoints
$AzureEndpoints = @(
"management.azure.com",
"login.microsoftonline.com",
"graph.microsoft.com",
"azurestackr01.azurestack.hci.microsoft.com"
)
"`nAzure Endpoint Connectivity:" | Add-Content $ReportFile
foreach ($Endpoint in $AzureEndpoints) {
$Result = Test-NetConnection -ComputerName $Endpoint -Port 443 -WarningAction SilentlyContinue
$Status = if ($Result.TcpTestSucceeded) { "PASS" } else { "FAIL" }
"$Endpoint : $Status (Latency: $($Result.PingReplyDetails.RoundtripTime)ms)" | Add-Content $ReportFile
}
Part 7: Generate Summary
7.1 Create Validation Summary
# Summary section
"`n" + "="*80 | Add-Content $ReportFile
"NETWORK VALIDATION SUMMARY" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile
$Summary = @"
Validation Category Status
------------------------------- --------
RDMA Adapters Enabled $(if($DisabledRdma){"FAIL"}else{"PASS"})
DCB Configuration $(if($DcbResults -match "FAIL"){"FAIL"}else{"PASS"})
SMB Direct Operational $(if($SmbDirect){"PASS"}else{"FAIL"})
VLAN Connectivity REVIEW ABOVE
DNS Resolution REVIEW ABOVE
NTP Synchronization $(if($TimeSync.Count -eq $Nodes.Count){"PASS"}else{"FAIL"})
Azure Connectivity REVIEW ABOVE
"@
$Summary | Add-Content $ReportFile
# Report location
"`nReport saved to: $ReportFile" | Add-Content $ReportFile
Write-Host "`nNetwork validation complete. Report: $ReportFile" -ForegroundColor Green
Validation Checklist
| Category | Test | Expected Result | Status |
|---|---|---|---|
| RDMA | All adapters enabled | Enabled = True | ☐ |
| RDMA | SMB Direct operational | RDMA-capable interfaces listed | ☐ |
| DCB | Validate-DCB passes | No FAIL results | ☐ |
| DCB | PFC enabled on correct priority | Priority 3 enabled | ☐ |
| DCB | ETS bandwidth allocation | SMB Direct ≥ 50% | ☐ |
| VLAN | All VLANs accessible | Ping succeeds | ☐ |
| DNS | Name resolution works | All targets resolve | ☐ |
| NTP | Time synchronized | Stratum ≤ 4, all nodes sync | ☐ |
| Azure | Endpoint connectivity | All endpoints reachable on 443 | ☐ |
Common Issues
RDMA Not Operational
# Re-enable RDMA on adapter
Enable-NetAdapterRdma -Name "Storage1"
# Verify RDMA is operational
Get-NetAdapterRdma -Name "Storage1"
DCB Misconfiguration
# Reset DCB to defaults and reconfigure
# WARNING: May disrupt storage traffic
# Remove existing policies
Get-NetQosPolicy | Remove-NetQosPolicy -Confirm:$false
# Recreate SMB Direct policy
New-NetQosPolicy -Name "SMB Direct" -NetDirectPortMatchCondition 445 -PriorityValue8021Action 3
Enable-NetQosFlowControl -Priority 3
Disable-NetQosFlowControl -Priority 0,1,2,4,5,6,7
Time Skew Between Nodes
# Force time sync
w32tm /resync /force
# Verify sync source
w32tm /query /source
Troubleshooting
| Issue | Cause | Resolution |
|---|---|---|
RDMA validation shows Not Operational | Network adapter driver missing RDMA support or disabled | Verify driver: Get-NetAdapterRdma; enable: Enable-NetAdapterRdma -Name <adapter>; update NIC firmware if needed |
| DCB/PFC misconfiguration detected | QoS policies conflicting or missing | Reset DCB: Get-NetQosPolicy | Remove-NetQosPolicy -Confirm:$false; recreate SMB Direct policy with correct priority class |
| RDMA throughput below expected baseline | Incorrect MTU or flow control settings | Verify jumbo frames: Get-NetAdapterAdvancedProperty -Name <adapter> -RegistryKeyword *JumboPacket; set MTU to 9014 |
Next Step
Proceed to Task 4: High Availability Testing once network validation is complete.
- Manual
- Orchestrated Script
- Standalone Script
When to use: Use this option for manual step-by-step execution.
See procedure steps above for manual execution guidance.
When to use: Use this option when deploying across multiple nodes from a management server using ariables.yml.
Script: See azurelocal-toolkit for the orchestrated script for this task.
Orchestrated script content references the toolkit repository.
When to use: Use this option for a self-contained deployment without a shared configuration file.
Script: See azurelocal-toolkit for the standalone script for this task.
Standalone script content references the toolkit repository.
Scripts for this task are located in the azurelocal-toolkit repository under scripts/deploy/ in the appropriate task folder.
Alternatives
The procedures in this task use the scripted methods shown in the tabs above. Additional deployment methods including Azure CLI and Bash scripts are available in the azurelocal-toolkit repository under scripts/deploy/.
| Method | Description |
|---|---|
| Azure CLI | PowerShell-based Azure CLI scripts for Azure resource operations |
| Bash | Linux/macOS compatible shell scripts for pipeline environments |
Navigation
| Previous | Up | Next |
|---|---|---|
| ← Task 2: VMFleet Storage Testing | Testing & Validation | Task 4: High Availability Testing → |
Version Control
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0.0 | 2026-03-24 | Azure Local Cloud | Initial release |