Task 03: Network & RDMA Validation
DOCUMENT CATEGORY: Runbook SCOPE: Network and RDMA validation PURPOSE: Validate network stack, RDMA, and DCB configuration MASTER REFERENCE: Microsoft Learn - Validate-DCB
Status: Active
Overview
This step validates the complete network stack including RDMA configuration, DCB (Data Center Bridging) settings, VLAN connectivity, and core network services. A successful network validation is critical for storage performance and cluster communication.
Prerequisites
- Infrastructure health validation completed (Step 1)
- Administrative access to all cluster nodes
- Physical network switches configured for RDMA
- VLAN IDs documented and configured
Report Output
All validation results are saved to:
\\<ClusterName>\ClusterStorage$\Collect\validation-reports\03-network-rdma-validation-YYYYMMDD.txt
Variables from variables.yml
| Variable Path | Type | Description |
|---|---|---|
networking.management.vlan_id | Integer | Management VLAN ID for connectivity tests |
networking.management.subnet | String | Management subnet CIDR for IP validation |
networking.management.gateway | String | Default gateway for ping tests |
networking.management.dns_servers | Array | DNS server IPs for resolution tests |
compute.nodes[].name | String | Node hostnames for per-node RDMA/DCB validation |
Part 1: Initialize Validation Environment
1.1 Create Report Directory
# Run from any cluster node
$ClusterName = (Get-Cluster).Name
$DateStamp = Get-Date -Format "yyyyMMdd"
$ReportPath = "C:\ClusterStorage\Collect\validation-reports"
$ReportFile = "$ReportPath\03-network-rdma-validation-$DateStamp.txt"
# Create directory if not exists
if (-not (Test-Path $ReportPath)) {
New-Item -Path $ReportPath -ItemType Directory -Force
}
# Initialize report
$ReportHeader = @"
================================================================================
NETWORK & RDMA VALIDATION REPORT
================================================================================
Cluster: $ClusterName
Date: $(Get-Date -Format "yyyy-MM-dd HH:mm:ss")
Generated By: $(whoami)
================================================================================
"@
$ReportHeader | Out-File -FilePath $ReportFile -Encoding UTF8
1.2 Install Required Modules
# Install Validate-DCB module if not present
if (-not (Get-Module -ListAvailable -Name Validate-DCB)) {
Install-Module -Name Validate-DCB -Force -Scope AllUsers
}
Import-Module Validate-DCB
# Verify Test-NetStack is available (built into Windows Server 2022)
Get-Command Test-NetStack -ErrorAction SilentlyContinue
Part 2: RDMA Adapter Validation
2.1 Verify RDMA Adapters
# Check RDMA adapter status on all nodes
$Nodes = (Get-ClusterNode).Name
$RdmaResults = foreach ($Node in $Nodes) {
Invoke-Command -ComputerName $Node -ScriptBlock {
Get-NetAdapterRdma | Select-Object @{N='Node';E={$env:COMPUTERNAME}},
Name, InterfaceDescription, Enabled,
@{N='OperationalState';E={if($_.Enabled){"Operational"}else{"Disabled"}}}
}
}
# Display and log results
$RdmaResults | Format-Table -AutoSize
$RdmaResults | Format-Table -AutoSize | Out-String | Add-Content $ReportFile
# Verify all RDMA adapters are enabled
$DisabledRdma = $RdmaResults | Where-Object { -not $_.Enabled }
if ($DisabledRdma) {
"WARNING: RDMA disabled on adapters:" | Add-Content $ReportFile
$DisabledRdma | Format-Table | Out-String | Add-Content $ReportFile
}
2.2 Verify RDMA Mode (RoCE v2 vs iWARP)
# Check RDMA protocol type
$RdmaProtocol = foreach ($Node in $Nodes) {
Invoke-Command -ComputerName $Node -ScriptBlock {
Get-NetAdapterAdvancedProperty -Name "Storage*" -RegistryKeyword "*NetworkDirect*" -ErrorAction SilentlyContinue |
Select-Object @{N='Node';E={$env:COMPUTERNAME}}, Name, RegistryKeyword, RegistryValue
}
}
"RDMA Protocol Configuration:" | Add-Content $ReportFile
$RdmaProtocol | Format-Table -AutoSize | Out-String | Add-Content $ReportFile
2.3 SMB Direct Status
# Verify SMB Direct is enabled
$SmbDirect = foreach ($Node in $Nodes) {
Invoke-Command -ComputerName $Node -ScriptBlock {
Get-SmbClientNetworkInterface | Where-Object RdmaCapable -eq $true |
Select-Object @{N='Node';E={$env:COMPUTERNAME}}, InterfaceIndex,
FriendlyName, RdmaCapable
}
}
"`nSMB Direct (RDMA) Capable Interfaces:" | Add-Content $ReportFile
$SmbDirect | Format-Table -AutoSize | Out-String | Add-Content $ReportFile
Part 3: DCB Validation
3.1 Run Validate-DCB
# Run comprehensive DCB validation
# This validates Priority Flow Control (PFC) and Enhanced Transmission Selection (ETS)
$DcbResults = Validate-DCB -Verbose
# Log results
"`n" + "="*80 | Add-Content $ReportFile
"DCB VALIDATION RESULTS" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile
$DcbResults | Out-String | Add-Content $ReportFile
3.2 Verify PFC Configuration
# Check Priority Flow Control settings
$PfcSettings = foreach ($Node in $Nodes) {
Invoke-Command -ComputerName $Node -ScriptBlock {
Get-NetQosDcbxSetting | Select-Object @{N='Node';E={$env:COMPUTERNAME}},
InterfaceAlias, Willing
Get-NetQosFlowControl | Select-Object @{N='Node';E={$env:COMPUTERNAME}},
Priority, Enabled
}
}
"`nPriority Flow Control Settings:" | Add-Content $ReportFile
$PfcSettings | Format-Table -AutoSize | Out-String | Add-Content $ReportFile
3.3 Verify ETS Configuration
# Check Enhanced Transmission Selection
$EtsSettings = foreach ($Node in $Nodes) {
Invoke-Command -ComputerName $Node -ScriptBlock {
Get-NetQosTrafficClass | Select-Object @{N='Node';E={$env:COMPUTERNAME}},
Name, Priority, BandwidthPercentage, Algorithm
}
}
"`nETS Traffic Class Settings:" | Add-Content $ReportFile
$EtsSettings | Format-Table -AutoSize | Out-String | Add-Content $ReportFile
Part 4: Test-NetStack Validation
4.1 Run Network Stack Tests
# Test-NetStack validates the entire network stack
# Run between storage network adapters
# Get storage adapter IPs
$StorageAdapters = Get-NetAdapter -Name "Storage*" | Get-NetIPAddress -AddressFamily IPv4
"`n" + "="*80 | Add-Content $ReportFile
"NETWORK STACK TEST RESULTS (Test-NetStack)" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile
# Run Test-NetStack between first two nodes
$Node1 = $Nodes[0]
$Node2 = $Nodes[1]
# Get storage IPs for each node
$Node1StorageIP = (Invoke-Command -ComputerName $Node1 -ScriptBlock {
(Get-NetIPAddress -InterfaceAlias "Storage*" -AddressFamily IPv4)[0].IPAddress
})
$Node2StorageIP = (Invoke-Command -ComputerName $Node2 -ScriptBlock {
(Get-NetIPAddress -InterfaceAlias "Storage*" -AddressFamily IPv4)[0].IPAddress
})
# Run Test-NetStack (requires Windows Server 2022+)
Invoke-Command -ComputerName $Node1 -ScriptBlock {
param($TargetIP)
Test-NetStack -Target $TargetIP -EnableFirewallRules
} -ArgumentList $Node2StorageIP | Out-String | Add-Content $ReportFile
4.2 RDMA Traffic Test
# Test RDMA connectivity between nodes
"`nRDMA Traffic Test:" | Add-Content $ReportFile
Invoke-Command -ComputerName $Node1 -ScriptBlock {
param($TargetIP)
# Test RDMA using NTttcp with RDMA mode
Test-NetStack -Target $TargetIP -EnableRDMA
} -ArgumentList $Node2StorageIP | Out-String | Add-Content $ReportFile
Part 5: VLAN Connectivity Validation
5.1 Verify VLAN Configuration
# Check VLAN assignments on virtual adapters
$VlanConfig = foreach ($Node in $Nodes) {
Invoke-Command -ComputerName $Node -ScriptBlock {
Get-VMNetworkAdapterVlan -ManagementOS |
Select-Object @{N='Node';E={$env:COMPUTERNAME}},
ParentAdapter, AccessVlanId, NativeVlanId, OperationMode
}
}
"`n" + "="*80 | Add-Content $ReportFile
"VLAN CONFIGURATION" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile
$VlanConfig | Format-Table -AutoSize | Out-String | Add-Content $ReportFile
5.2 Test VLAN Connectivity
# Ping test across VLANs
$VlanTests = @(
@{Name="Management"; VLAN=711; TestIP="10.X.X.1"},
@{Name="Storage1"; VLAN=712; TestIP="10.X.X.1"},
@{Name="Storage2"; VLAN=713; TestIP="10.X.X.1"},
@{Name="VM Traffic"; VLAN=714; TestIP="10.X.X.1"}
)
"`nVLAN Connectivity Tests:" | Add-Content $ReportFile
foreach ($Vlan in $VlanTests) {
# Replace TestIP with actual gateway/target for each VLAN
$Result = Test-Connection -ComputerName $Vlan.TestIP -Count 2 -Quiet -ErrorAction SilentlyContinue
$Status = if ($Result) { "PASS" } else { "FAIL" }
"VLAN $($Vlan.VLAN) ($($Vlan.Name)): $Status" | Add-Content $ReportFile
}
Part 6: Core Network Services Validation
6.1 DNS Resolution
"`n" + "="*80 | Add-Content $ReportFile
"CORE NETWORK SERVICES" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile
# Test DNS resolution on all nodes
$DnsTests = @(
"management.azure.com",
"login.microsoftonline.com",
"$ClusterName",
"dc01.domain.local" # Replace with actual DC
)
"`nDNS Resolution Tests:" | Add-Content $ReportFile
foreach ($Target in $DnsTests) {
$Result = Resolve-DnsName -Name $Target -ErrorAction SilentlyContinue
$Status = if ($Result) { "PASS - $($Result[0].IPAddress)" } else { "FAIL" }
"$Target : $Status" | Add-Content $ReportFile
}
6.2 NTP Synchronization
# Verify time sync across all nodes
$TimeSync = foreach ($Node in $Nodes) {
Invoke-Command -ComputerName $Node -ScriptBlock {
$w32tm = w32tm /query /status 2>&1
[PSCustomObject]@{
Node = $env:COMPUTERNAME
Source = ($w32tm | Select-String "Source:").ToString().Split(":")[1].Trim()
Stratum = ($w32tm | Select-String "Stratum:").ToString().Split(":")[1].Trim()
LastSync = ($w32tm | Select-String "Last Successful").ToString().Split(": ")[1]
}
}
}
"`nNTP Time Synchronization:" | Add-Content $ReportFile
$TimeSync | Format-Table -AutoSize | Out-String | Add-Content $ReportFile
# Check for time skew between nodes
$TimeDiff = foreach ($Node in $Nodes) {
Invoke-Command -ComputerName $Node -ScriptBlock {
[PSCustomObject]@{
Node = $env:COMPUTERNAME
CurrentTime = Get-Date -Format "yyyy-MM-dd HH:mm:ss.fff"
}
}
}
"`nNode Time Comparison:" | Add-Content $ReportFile
$TimeDiff | Format-Table | Out-String | Add-Content $ReportFile
6.3 Azure Connectivity
# Test connectivity to Azure endpoints
$AzureEndpoints = @(
"management.azure.com",
"login.microsoftonline.com",
"graph.microsoft.com",
"azurestackr01.azurestack.hci.microsoft.com"
)
"`nAzure Endpoint Connectivity:" | Add-Content $ReportFile
foreach ($Endpoint in $AzureEndpoints) {
$Result = Test-NetConnection -ComputerName $Endpoint -Port 443 -WarningAction SilentlyContinue
$Status = if ($Result.TcpTestSucceeded) { "PASS" } else { "FAIL" }
"$Endpoint : $Status (Latency: $($Result.PingReplyDetails.RoundtripTime)ms)" | Add-Content $ReportFile
}
Part 7: Generate Summary
7.1 Create Validation Summary
# Summary section
"`n" + "="*80 | Add-Content $ReportFile
"NETWORK VALIDATION SUMMARY" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile
$Summary = @"
Validation Category Status
------------------------------- --------
RDMA Adapters Enabled $(if($DisabledRdma){"FAIL"}else{"PASS"})
DCB Configuration $(if($DcbResults -match "FAIL"){"FAIL"}else{"PASS"})
SMB Direct Operational $(if($SmbDirect){"PASS"}else{"FAIL"})
VLAN Connectivity REVIEW ABOVE
DNS Resolution REVIEW ABOVE
NTP Synchronization $(if($TimeSync.Count -eq $Nodes.Count){"PASS"}else{"FAIL"})
Azure Connectivity REVIEW ABOVE
"@
$Summary | Add-Content $ReportFile
# Report location
"`nReport saved to: $ReportFile" | Add-Content $ReportFile
Write-Host "`nNetwork validation complete. Report: $ReportFile" -ForegroundColor Green
Validation Checklist
| Category | Test | Expected Result | Status |
|---|---|---|---|
| RDMA | All adapters enabled | Enabled = True | ☐ |
| RDMA | SMB Direct operational | RDMA-capable interfaces listed | ☐ |
| DCB | Validate-DCB passes | No FAIL results | ☐ |
| DCB | PFC enabled on correct priority | Priority 3 enabled | ☐ |
| DCB | ETS bandwidth allocation | SMB Direct ≥ 50% | ☐ |
| VLAN | All VLANs accessible | Ping succeeds | ☐ |
| DNS | Name resolution works | All targets resolve | ☐ |
| NTP | Time synchronized | Stratum ≤ 4, all nodes sync | ☐ |
| Azure | Endpoint connectivity | All endpoints reachable on 443 | ☐ |
Common Issues
RDMA Not Operational
# Re-enable RDMA on adapter
Enable-NetAdapterRdma -Name "Storage1"
# Verify RDMA is operational
Get-NetAdapterRdma -Name "Storage1"
DCB Misconfiguration
# Reset DCB to defaults and reconfigure
# WARNING: May disrupt storage traffic
# Remove existing policies
Get-NetQosPolicy | Remove-NetQosPolicy -Confirm:$false
# Recreate SMB Direct policy
New-NetQosPolicy -Name "SMB Direct" -NetDirectPortMatchCondition 445 -PriorityValue8021Action 3
Enable-NetQosFlowControl -Priority 3
Disable-NetQosFlowControl -Priority 0,1,2,4,5,6,7
Time Skew Between Nodes
# Force time sync
w32tm /resync /force
# Verify sync source
w32tm /query /source
Troubleshooting
| Issue | Cause | Resolution |
|---|---|---|
RDMA validation shows Not Operational | Network adapter driver missing RDMA support or disabled | Verify driver: Get-NetAdapterRdma; enable: Enable-NetAdapterRdma -Name <adapter>; update NIC firmware if needed |
| DCB/PFC misconfiguration detected | QoS policies conflicting or missing | Reset DCB: Get-NetQosPolicy | Remove-NetQosPolicy -Confirm:$false; recreate SMB Direct policy with correct priority class |
| RDMA throughput below expected baseline | Incorrect MTU or flow control settings | Verify jumbo frames: Get-NetAdapterAdvancedProperty -Name <adapter> -RegistryKeyword *JumboPacket; set MTU to 9014 |
Next Step
Proceed to Task 4: High Availability Testing once network validation is complete.
Navigation
| Previous | Up | Next |
|---|---|---|
| ← Task 2: VMFleet Storage Testing | Testing & Validation | Task 4: High Availability Testing → |
Version Control
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0.0 | 2026-03-24 | Azure Local Cloudnology Team | Initial release |