Skip to main content
Version: Next

Task 03: Network & RDMA Validation

Runbook Azure

DOCUMENT CATEGORY: Runbook SCOPE: Network and RDMA validation PURPOSE: Validate network stack, RDMA, and DCB configuration MASTER REFERENCE: Microsoft Learn - Validate-DCB

Status: Active


Overview

This step validates the complete network stack including RDMA configuration, DCB (Data Center Bridging) settings, VLAN connectivity, and core network services. A successful network validation is critical for storage performance and cluster communication.

Prerequisites

  • Infrastructure health validation completed (Step 1)
  • Administrative access to all cluster nodes
  • Physical network switches configured for RDMA
  • VLAN IDs documented and configured

Report Output

All validation results are saved to:

\\<ClusterName>\ClusterStorage$\Collect\validation-reports\03-network-rdma-validation-YYYYMMDD.txt

Variables from variables.yml

Variable PathTypeDescription
networking.management.vlan_idIntegerManagement VLAN ID for connectivity tests
networking.management.subnetStringManagement subnet CIDR for IP validation
networking.management.gatewayStringDefault gateway for ping tests
networking.management.dns_serversArrayDNS server IPs for resolution tests
compute.nodes[].nameStringNode hostnames for per-node RDMA/DCB validation

Part 1: Initialize Validation Environment

1.1 Create Report Directory

# Run from any cluster node
$ClusterName = (Get-Cluster).Name
$DateStamp = Get-Date -Format "yyyyMMdd"
$ReportPath = "C:\ClusterStorage\Collect\validation-reports"
$ReportFile = "$ReportPath\03-network-rdma-validation-$DateStamp.txt"

# Create directory if not exists
if (-not (Test-Path $ReportPath)) {
New-Item -Path $ReportPath -ItemType Directory -Force
}

# Initialize report
$ReportHeader = @"
================================================================================
NETWORK & RDMA VALIDATION REPORT
================================================================================
Cluster: $ClusterName
Date: $(Get-Date -Format "yyyy-MM-dd HH:mm:ss")
Generated By: $(whoami)
================================================================================

"@
$ReportHeader | Out-File -FilePath $ReportFile -Encoding UTF8

1.2 Install Required Modules

# Install Validate-DCB module if not present
if (-not (Get-Module -ListAvailable -Name Validate-DCB)) {
Install-Module -Name Validate-DCB -Force -Scope AllUsers
}
Import-Module Validate-DCB

# Verify Test-NetStack is available (built into Windows Server 2022)
Get-Command Test-NetStack -ErrorAction SilentlyContinue

Part 2: RDMA Adapter Validation

2.1 Verify RDMA Adapters

# Check RDMA adapter status on all nodes
$Nodes = (Get-ClusterNode).Name

$RdmaResults = foreach ($Node in $Nodes) {
Invoke-Command -ComputerName $Node -ScriptBlock {
Get-NetAdapterRdma | Select-Object @{N='Node';E={$env:COMPUTERNAME}},
Name, InterfaceDescription, Enabled,
@{N='OperationalState';E={if($_.Enabled){"Operational"}else{"Disabled"}}}
}
}

# Display and log results
$RdmaResults | Format-Table -AutoSize
$RdmaResults | Format-Table -AutoSize | Out-String | Add-Content $ReportFile

# Verify all RDMA adapters are enabled
$DisabledRdma = $RdmaResults | Where-Object { -not $_.Enabled }
if ($DisabledRdma) {
"WARNING: RDMA disabled on adapters:" | Add-Content $ReportFile
$DisabledRdma | Format-Table | Out-String | Add-Content $ReportFile
}

2.2 Verify RDMA Mode (RoCE v2 vs iWARP)

# Check RDMA protocol type
$RdmaProtocol = foreach ($Node in $Nodes) {
Invoke-Command -ComputerName $Node -ScriptBlock {
Get-NetAdapterAdvancedProperty -Name "Storage*" -RegistryKeyword "*NetworkDirect*" -ErrorAction SilentlyContinue |
Select-Object @{N='Node';E={$env:COMPUTERNAME}}, Name, RegistryKeyword, RegistryValue
}
}

"RDMA Protocol Configuration:" | Add-Content $ReportFile
$RdmaProtocol | Format-Table -AutoSize | Out-String | Add-Content $ReportFile

2.3 SMB Direct Status

# Verify SMB Direct is enabled
$SmbDirect = foreach ($Node in $Nodes) {
Invoke-Command -ComputerName $Node -ScriptBlock {
Get-SmbClientNetworkInterface | Where-Object RdmaCapable -eq $true |
Select-Object @{N='Node';E={$env:COMPUTERNAME}}, InterfaceIndex,
FriendlyName, RdmaCapable
}
}

"`nSMB Direct (RDMA) Capable Interfaces:" | Add-Content $ReportFile
$SmbDirect | Format-Table -AutoSize | Out-String | Add-Content $ReportFile

Part 3: DCB Validation

3.1 Run Validate-DCB

# Run comprehensive DCB validation
# This validates Priority Flow Control (PFC) and Enhanced Transmission Selection (ETS)

$DcbResults = Validate-DCB -Verbose

# Log results
"`n" + "="*80 | Add-Content $ReportFile
"DCB VALIDATION RESULTS" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile
$DcbResults | Out-String | Add-Content $ReportFile

3.2 Verify PFC Configuration

# Check Priority Flow Control settings
$PfcSettings = foreach ($Node in $Nodes) {
Invoke-Command -ComputerName $Node -ScriptBlock {
Get-NetQosDcbxSetting | Select-Object @{N='Node';E={$env:COMPUTERNAME}},
InterfaceAlias, Willing
Get-NetQosFlowControl | Select-Object @{N='Node';E={$env:COMPUTERNAME}},
Priority, Enabled
}
}

"`nPriority Flow Control Settings:" | Add-Content $ReportFile
$PfcSettings | Format-Table -AutoSize | Out-String | Add-Content $ReportFile

3.3 Verify ETS Configuration

# Check Enhanced Transmission Selection
$EtsSettings = foreach ($Node in $Nodes) {
Invoke-Command -ComputerName $Node -ScriptBlock {
Get-NetQosTrafficClass | Select-Object @{N='Node';E={$env:COMPUTERNAME}},
Name, Priority, BandwidthPercentage, Algorithm
}
}

"`nETS Traffic Class Settings:" | Add-Content $ReportFile
$EtsSettings | Format-Table -AutoSize | Out-String | Add-Content $ReportFile

Part 4: Test-NetStack Validation

4.1 Run Network Stack Tests

# Test-NetStack validates the entire network stack
# Run between storage network adapters

# Get storage adapter IPs
$StorageAdapters = Get-NetAdapter -Name "Storage*" | Get-NetIPAddress -AddressFamily IPv4

"`n" + "="*80 | Add-Content $ReportFile
"NETWORK STACK TEST RESULTS (Test-NetStack)" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile

# Run Test-NetStack between first two nodes
$Node1 = $Nodes[0]
$Node2 = $Nodes[1]

# Get storage IPs for each node
$Node1StorageIP = (Invoke-Command -ComputerName $Node1 -ScriptBlock {
(Get-NetIPAddress -InterfaceAlias "Storage*" -AddressFamily IPv4)[0].IPAddress
})
$Node2StorageIP = (Invoke-Command -ComputerName $Node2 -ScriptBlock {
(Get-NetIPAddress -InterfaceAlias "Storage*" -AddressFamily IPv4)[0].IPAddress
})

# Run Test-NetStack (requires Windows Server 2022+)
Invoke-Command -ComputerName $Node1 -ScriptBlock {
param($TargetIP)
Test-NetStack -Target $TargetIP -EnableFirewallRules
} -ArgumentList $Node2StorageIP | Out-String | Add-Content $ReportFile

4.2 RDMA Traffic Test

# Test RDMA connectivity between nodes
"`nRDMA Traffic Test:" | Add-Content $ReportFile

Invoke-Command -ComputerName $Node1 -ScriptBlock {
param($TargetIP)
# Test RDMA using NTttcp with RDMA mode
Test-NetStack -Target $TargetIP -EnableRDMA
} -ArgumentList $Node2StorageIP | Out-String | Add-Content $ReportFile

Part 5: VLAN Connectivity Validation

5.1 Verify VLAN Configuration

# Check VLAN assignments on virtual adapters
$VlanConfig = foreach ($Node in $Nodes) {
Invoke-Command -ComputerName $Node -ScriptBlock {
Get-VMNetworkAdapterVlan -ManagementOS |
Select-Object @{N='Node';E={$env:COMPUTERNAME}},
ParentAdapter, AccessVlanId, NativeVlanId, OperationMode
}
}

"`n" + "="*80 | Add-Content $ReportFile
"VLAN CONFIGURATION" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile
$VlanConfig | Format-Table -AutoSize | Out-String | Add-Content $ReportFile

5.2 Test VLAN Connectivity

# Ping test across VLANs
$VlanTests = @(
@{Name="Management"; VLAN=711; TestIP="10.X.X.1"},
@{Name="Storage1"; VLAN=712; TestIP="10.X.X.1"},
@{Name="Storage2"; VLAN=713; TestIP="10.X.X.1"},
@{Name="VM Traffic"; VLAN=714; TestIP="10.X.X.1"}
)

"`nVLAN Connectivity Tests:" | Add-Content $ReportFile
foreach ($Vlan in $VlanTests) {
# Replace TestIP with actual gateway/target for each VLAN
$Result = Test-Connection -ComputerName $Vlan.TestIP -Count 2 -Quiet -ErrorAction SilentlyContinue
$Status = if ($Result) { "PASS" } else { "FAIL" }
"VLAN $($Vlan.VLAN) ($($Vlan.Name)): $Status" | Add-Content $ReportFile
}

Part 6: Core Network Services Validation

6.1 DNS Resolution

"`n" + "="*80 | Add-Content $ReportFile
"CORE NETWORK SERVICES" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile

# Test DNS resolution on all nodes
$DnsTests = @(
"management.azure.com",
"login.microsoftonline.com",
"$ClusterName",
"dc01.domain.local" # Replace with actual DC
)

"`nDNS Resolution Tests:" | Add-Content $ReportFile
foreach ($Target in $DnsTests) {
$Result = Resolve-DnsName -Name $Target -ErrorAction SilentlyContinue
$Status = if ($Result) { "PASS - $($Result[0].IPAddress)" } else { "FAIL" }
"$Target : $Status" | Add-Content $ReportFile
}

6.2 NTP Synchronization

# Verify time sync across all nodes
$TimeSync = foreach ($Node in $Nodes) {
Invoke-Command -ComputerName $Node -ScriptBlock {
$w32tm = w32tm /query /status 2>&1
[PSCustomObject]@{
Node = $env:COMPUTERNAME
Source = ($w32tm | Select-String "Source:").ToString().Split(":")[1].Trim()
Stratum = ($w32tm | Select-String "Stratum:").ToString().Split(":")[1].Trim()
LastSync = ($w32tm | Select-String "Last Successful").ToString().Split(": ")[1]
}
}
}

"`nNTP Time Synchronization:" | Add-Content $ReportFile
$TimeSync | Format-Table -AutoSize | Out-String | Add-Content $ReportFile

# Check for time skew between nodes
$TimeDiff = foreach ($Node in $Nodes) {
Invoke-Command -ComputerName $Node -ScriptBlock {
[PSCustomObject]@{
Node = $env:COMPUTERNAME
CurrentTime = Get-Date -Format "yyyy-MM-dd HH:mm:ss.fff"
}
}
}
"`nNode Time Comparison:" | Add-Content $ReportFile
$TimeDiff | Format-Table | Out-String | Add-Content $ReportFile

6.3 Azure Connectivity

# Test connectivity to Azure endpoints
$AzureEndpoints = @(
"management.azure.com",
"login.microsoftonline.com",
"graph.microsoft.com",
"azurestackr01.azurestack.hci.microsoft.com"
)

"`nAzure Endpoint Connectivity:" | Add-Content $ReportFile
foreach ($Endpoint in $AzureEndpoints) {
$Result = Test-NetConnection -ComputerName $Endpoint -Port 443 -WarningAction SilentlyContinue
$Status = if ($Result.TcpTestSucceeded) { "PASS" } else { "FAIL" }
"$Endpoint : $Status (Latency: $($Result.PingReplyDetails.RoundtripTime)ms)" | Add-Content $ReportFile
}

Part 7: Generate Summary

7.1 Create Validation Summary

# Summary section
"`n" + "="*80 | Add-Content $ReportFile
"NETWORK VALIDATION SUMMARY" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile

$Summary = @"

Validation Category Status
------------------------------- --------
RDMA Adapters Enabled $(if($DisabledRdma){"FAIL"}else{"PASS"})
DCB Configuration $(if($DcbResults -match "FAIL"){"FAIL"}else{"PASS"})
SMB Direct Operational $(if($SmbDirect){"PASS"}else{"FAIL"})
VLAN Connectivity REVIEW ABOVE
DNS Resolution REVIEW ABOVE
NTP Synchronization $(if($TimeSync.Count -eq $Nodes.Count){"PASS"}else{"FAIL"})
Azure Connectivity REVIEW ABOVE

"@

$Summary | Add-Content $ReportFile

# Report location
"`nReport saved to: $ReportFile" | Add-Content $ReportFile
Write-Host "`nNetwork validation complete. Report: $ReportFile" -ForegroundColor Green

Validation Checklist

CategoryTestExpected ResultStatus
RDMAAll adapters enabledEnabled = True
RDMASMB Direct operationalRDMA-capable interfaces listed
DCBValidate-DCB passesNo FAIL results
DCBPFC enabled on correct priorityPriority 3 enabled
DCBETS bandwidth allocationSMB Direct ≥ 50%
VLANAll VLANs accessiblePing succeeds
DNSName resolution worksAll targets resolve
NTPTime synchronizedStratum ≤ 4, all nodes sync
AzureEndpoint connectivityAll endpoints reachable on 443

Common Issues

RDMA Not Operational

# Re-enable RDMA on adapter
Enable-NetAdapterRdma -Name "Storage1"

# Verify RDMA is operational
Get-NetAdapterRdma -Name "Storage1"

DCB Misconfiguration

# Reset DCB to defaults and reconfigure
# WARNING: May disrupt storage traffic

# Remove existing policies
Get-NetQosPolicy | Remove-NetQosPolicy -Confirm:$false

# Recreate SMB Direct policy
New-NetQosPolicy -Name "SMB Direct" -NetDirectPortMatchCondition 445 -PriorityValue8021Action 3
Enable-NetQosFlowControl -Priority 3
Disable-NetQosFlowControl -Priority 0,1,2,4,5,6,7

Time Skew Between Nodes

# Force time sync
w32tm /resync /force

# Verify sync source
w32tm /query /source

Troubleshooting

IssueCauseResolution
RDMA validation shows Not OperationalNetwork adapter driver missing RDMA support or disabledVerify driver: Get-NetAdapterRdma; enable: Enable-NetAdapterRdma -Name <adapter>; update NIC firmware if needed
DCB/PFC misconfiguration detectedQoS policies conflicting or missingReset DCB: Get-NetQosPolicy | Remove-NetQosPolicy -Confirm:$false; recreate SMB Direct policy with correct priority class
RDMA throughput below expected baselineIncorrect MTU or flow control settingsVerify jumbo frames: Get-NetAdapterAdvancedProperty -Name <adapter> -RegistryKeyword *JumboPacket; set MTU to 9014

Next Step

Proceed to Task 4: High Availability Testing once network validation is complete.


PreviousUpNext
← Task 2: VMFleet Storage TestingTesting & ValidationTask 4: High Availability Testing →

Version Control

VersionDateAuthorChanges
1.0.02026-03-24Azure Local Cloudnology TeamInitial release