Skip to main content
Version: 1.0.0

Task 03: Network & RDMA Validation

Runbook Azure

DOCUMENT CATEGORY: Runbook
SCOPE: Network and RDMA validation
PURPOSE: Validate network stack, RDMA, and DCB configuration
MASTER REFERENCE: Microsoft Learn - Validate-DCB

Status: Active

Overview

This step validates the complete network stack including RDMA configuration, DCB (Data Center Bridging) settings, VLAN connectivity, and core network services. A successful network validation is critical for storage performance and cluster communication.

Prerequisites

  • Infrastructure health validation completed (Step 1)
  • Administrative access to all cluster nodes
  • Physical network switches configured for RDMA
  • VLAN IDs documented and configured

Report Output

All validation results are saved to:

\\<ClusterName>\ClusterStorage$\Collect\validation-reports\03-network-rdma-validation-YYYYMMDD.txt

Variables from variables.yml

Variable PathTypeDescription
networking.management.vlan_idIntegerManagement VLAN ID for connectivity tests
networking.management.subnetStringManagement subnet CIDR for IP validation
networking.management.gatewayStringDefault gateway for ping tests
networking.management.dns_serversArrayDNS server IPs for resolution tests
compute.nodes[].nameStringNode hostnames for per-node RDMA/DCB validation

Part 1: Initialize Validation Environment

1.1 Create Report Directory

# Run from any cluster node
$ClusterName = (Get-Cluster).Name
$DateStamp = Get-Date -Format "yyyyMMdd"
$ReportPath = "C:\ClusterStorage\Collect\validation-reports"
$ReportFile = "$ReportPath\03-network-rdma-validation-$DateStamp.txt"

# Create directory if not exists
if (-not (Test-Path $ReportPath)) {
New-Item -Path $ReportPath -ItemType Directory -Force
}

# Initialize report
$ReportHeader = @"
================================================================================
NETWORK & RDMA VALIDATION REPORT
================================================================================
Cluster: $ClusterName
Date: $(Get-Date -Format "yyyy-MM-dd HH:mm:ss")
Generated By: $(whoami)
================================================================================

"@
$ReportHeader | Out-File -FilePath $ReportFile -Encoding UTF8

1.2 Install Required Modules

# Install Validate-DCB module if not present
if (-not (Get-Module -ListAvailable -Name Validate-DCB)) {
Install-Module -Name Validate-DCB -Force -Scope AllUsers
}
Import-Module Validate-DCB

# Verify Test-NetStack is available (built into Windows Server 2022)
Get-Command Test-NetStack -ErrorAction SilentlyContinue

Part 2: RDMA Adapter Validation

2.1 Verify RDMA Adapters

# Check RDMA adapter status on all nodes
$Nodes = (Get-ClusterNode).Name

$RdmaResults = foreach ($Node in $Nodes) {
Invoke-Command -ComputerName $Node -ScriptBlock {
Get-NetAdapterRdma | Select-Object @{N='Node';E={$env:COMPUTERNAME}},
Name, InterfaceDescription, Enabled,
@{N='OperationalState';E={if($_.Enabled){"Operational"}else{"Disabled"}}}
}
}

# Display and log results
$RdmaResults | Format-Table -AutoSize
$RdmaResults | Format-Table -AutoSize | Out-String | Add-Content $ReportFile

# Verify all RDMA adapters are enabled
$DisabledRdma = $RdmaResults | Where-Object { -not $_.Enabled }
if ($DisabledRdma) {
"WARNING: RDMA disabled on adapters:" | Add-Content $ReportFile
$DisabledRdma | Format-Table | Out-String | Add-Content $ReportFile
}

2.2 Verify RDMA Mode (RoCE v2 vs iWARP)

# Check RDMA protocol type
$RdmaProtocol = foreach ($Node in $Nodes) {
Invoke-Command -ComputerName $Node -ScriptBlock {
Get-NetAdapterAdvancedProperty -Name "Storage*" -RegistryKeyword "*NetworkDirect*" -ErrorAction SilentlyContinue |
Select-Object @{N='Node';E={$env:COMPUTERNAME}}, Name, RegistryKeyword, RegistryValue
}
}

"RDMA Protocol Configuration:" | Add-Content $ReportFile
$RdmaProtocol | Format-Table -AutoSize | Out-String | Add-Content $ReportFile

2.3 SMB Direct Status

# Verify SMB Direct is enabled
$SmbDirect = foreach ($Node in $Nodes) {
Invoke-Command -ComputerName $Node -ScriptBlock {
Get-SmbClientNetworkInterface | Where-Object RdmaCapable -eq $true |
Select-Object @{N='Node';E={$env:COMPUTERNAME}}, InterfaceIndex,
FriendlyName, RdmaCapable
}
}

"`nSMB Direct (RDMA) Capable Interfaces:" | Add-Content $ReportFile
$SmbDirect | Format-Table -AutoSize | Out-String | Add-Content $ReportFile

Part 3: DCB Validation

3.1 Run Validate-DCB

# Run comprehensive DCB validation
# This validates Priority Flow Control (PFC) and Enhanced Transmission Selection (ETS)

$DcbResults = Validate-DCB -Verbose

# Log results
"`n" + "="*80 | Add-Content $ReportFile
"DCB VALIDATION RESULTS" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile
$DcbResults | Out-String | Add-Content $ReportFile

3.2 Verify PFC Configuration

# Check Priority Flow Control settings
$PfcSettings = foreach ($Node in $Nodes) {
Invoke-Command -ComputerName $Node -ScriptBlock {
Get-NetQosDcbxSetting | Select-Object @{N='Node';E={$env:COMPUTERNAME}},
InterfaceAlias, Willing
Get-NetQosFlowControl | Select-Object @{N='Node';E={$env:COMPUTERNAME}},
Priority, Enabled
}
}

"`nPriority Flow Control Settings:" | Add-Content $ReportFile
$PfcSettings | Format-Table -AutoSize | Out-String | Add-Content $ReportFile

3.3 Verify ETS Configuration

# Check Enhanced Transmission Selection
$EtsSettings = foreach ($Node in $Nodes) {
Invoke-Command -ComputerName $Node -ScriptBlock {
Get-NetQosTrafficClass | Select-Object @{N='Node';E={$env:COMPUTERNAME}},
Name, Priority, BandwidthPercentage, Algorithm
}
}

"`nETS Traffic Class Settings:" | Add-Content $ReportFile
$EtsSettings | Format-Table -AutoSize | Out-String | Add-Content $ReportFile

Part 4: Test-NetStack Validation

4.1 Run Network Stack Tests

# Test-NetStack validates the entire network stack
# Run between storage network adapters

# Get storage adapter IPs
$StorageAdapters = Get-NetAdapter -Name "Storage*" | Get-NetIPAddress -AddressFamily IPv4

"`n" + "="*80 | Add-Content $ReportFile
"NETWORK STACK TEST RESULTS (Test-NetStack)" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile

# Run Test-NetStack between first two nodes
$Node1 = $Nodes[0]
$Node2 = $Nodes[1]

# Get storage IPs for each node
$Node1StorageIP = (Invoke-Command -ComputerName $Node1 -ScriptBlock {
(Get-NetIPAddress -InterfaceAlias "Storage*" -AddressFamily IPv4)[0].IPAddress
})
$Node2StorageIP = (Invoke-Command -ComputerName $Node2 -ScriptBlock {
(Get-NetIPAddress -InterfaceAlias "Storage*" -AddressFamily IPv4)[0].IPAddress
})

# Run Test-NetStack (requires Windows Server 2022+)
Invoke-Command -ComputerName $Node1 -ScriptBlock {
param($TargetIP)
Test-NetStack -Target $TargetIP -EnableFirewallRules
} -ArgumentList $Node2StorageIP | Out-String | Add-Content $ReportFile

4.2 RDMA Traffic Test

# Test RDMA connectivity between nodes
"`nRDMA Traffic Test:" | Add-Content $ReportFile

Invoke-Command -ComputerName $Node1 -ScriptBlock {
param($TargetIP)
# Test RDMA using NTttcp with RDMA mode
Test-NetStack -Target $TargetIP -EnableRDMA
} -ArgumentList $Node2StorageIP | Out-String | Add-Content $ReportFile

Part 5: VLAN Connectivity Validation

5.1 Verify VLAN Configuration

# Check VLAN assignments on virtual adapters
$VlanConfig = foreach ($Node in $Nodes) {
Invoke-Command -ComputerName $Node -ScriptBlock {
Get-VMNetworkAdapterVlan -ManagementOS |
Select-Object @{N='Node';E={$env:COMPUTERNAME}},
ParentAdapter, AccessVlanId, NativeVlanId, OperationMode
}
}

"`n" + "="*80 | Add-Content $ReportFile
"VLAN CONFIGURATION" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile
$VlanConfig | Format-Table -AutoSize | Out-String | Add-Content $ReportFile

5.2 Test VLAN Connectivity

# Ping test across VLANs
$VlanTests = @(
@{Name="Management"; VLAN=711; TestIP="10.X.X.1"},
@{Name="Storage1"; VLAN=712; TestIP="10.X.X.1"},
@{Name="Storage2"; VLAN=713; TestIP="10.X.X.1"},
@{Name="VM Traffic"; VLAN=714; TestIP="10.X.X.1"}
)

"`nVLAN Connectivity Tests:" | Add-Content $ReportFile
foreach ($Vlan in $VlanTests) {
# Replace TestIP with actual gateway/target for each VLAN
$Result = Test-Connection -ComputerName $Vlan.TestIP -Count 2 -Quiet -ErrorAction SilentlyContinue
$Status = if ($Result) { "PASS" } else { "FAIL" }
"VLAN $($Vlan.VLAN) ($($Vlan.Name)): $Status" | Add-Content $ReportFile
}

Part 6: Core Network Services Validation

6.1 DNS Resolution

"`n" + "="*80 | Add-Content $ReportFile
"CORE NETWORK SERVICES" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile

# Test DNS resolution on all nodes
$DnsTests = @(
"management.azure.com",
"login.microsoftonline.com",
"$ClusterName",
"dc01.domain.local" # Replace with actual DC
)

"`nDNS Resolution Tests:" | Add-Content $ReportFile
foreach ($Target in $DnsTests) {
$Result = Resolve-DnsName -Name $Target -ErrorAction SilentlyContinue
$Status = if ($Result) { "PASS - $($Result[0].IPAddress)" } else { "FAIL" }
"$Target : $Status" | Add-Content $ReportFile
}

6.2 NTP Synchronization

# Verify time sync across all nodes
$TimeSync = foreach ($Node in $Nodes) {
Invoke-Command -ComputerName $Node -ScriptBlock {
$w32tm = w32tm /query /status 2>&1
[PSCustomObject]@{
Node = $env:COMPUTERNAME
Source = ($w32tm | Select-String "Source:").ToString().Split(":")[1].Trim()
Stratum = ($w32tm | Select-String "Stratum:").ToString().Split(":")[1].Trim()
LastSync = ($w32tm | Select-String "Last Successful").ToString().Split(": ")[1]
}
}
}

"`nNTP Time Synchronization:" | Add-Content $ReportFile
$TimeSync | Format-Table -AutoSize | Out-String | Add-Content $ReportFile

# Check for time skew between nodes
$TimeDiff = foreach ($Node in $Nodes) {
Invoke-Command -ComputerName $Node -ScriptBlock {
[PSCustomObject]@{
Node = $env:COMPUTERNAME
CurrentTime = Get-Date -Format "yyyy-MM-dd HH:mm:ss.fff"
}
}
}
"`nNode Time Comparison:" | Add-Content $ReportFile
$TimeDiff | Format-Table | Out-String | Add-Content $ReportFile

6.3 Azure Connectivity

# Test connectivity to Azure endpoints
$AzureEndpoints = @(
"management.azure.com",
"login.microsoftonline.com",
"graph.microsoft.com",
"azurestackr01.azurestack.hci.microsoft.com"
)

"`nAzure Endpoint Connectivity:" | Add-Content $ReportFile
foreach ($Endpoint in $AzureEndpoints) {
$Result = Test-NetConnection -ComputerName $Endpoint -Port 443 -WarningAction SilentlyContinue
$Status = if ($Result.TcpTestSucceeded) { "PASS" } else { "FAIL" }
"$Endpoint : $Status (Latency: $($Result.PingReplyDetails.RoundtripTime)ms)" | Add-Content $ReportFile
}

Part 7: Generate Summary

7.1 Create Validation Summary

# Summary section
"`n" + "="*80 | Add-Content $ReportFile
"NETWORK VALIDATION SUMMARY" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile

$Summary = @"

Validation Category Status
------------------------------- --------
RDMA Adapters Enabled $(if($DisabledRdma){"FAIL"}else{"PASS"})
DCB Configuration $(if($DcbResults -match "FAIL"){"FAIL"}else{"PASS"})
SMB Direct Operational $(if($SmbDirect){"PASS"}else{"FAIL"})
VLAN Connectivity REVIEW ABOVE
DNS Resolution REVIEW ABOVE
NTP Synchronization $(if($TimeSync.Count -eq $Nodes.Count){"PASS"}else{"FAIL"})
Azure Connectivity REVIEW ABOVE

"@

$Summary | Add-Content $ReportFile

# Report location
"`nReport saved to: $ReportFile" | Add-Content $ReportFile
Write-Host "`nNetwork validation complete. Report: $ReportFile" -ForegroundColor Green

Validation Checklist

CategoryTestExpected ResultStatus
RDMAAll adapters enabledEnabled = True
RDMASMB Direct operationalRDMA-capable interfaces listed
DCBValidate-DCB passesNo FAIL results
DCBPFC enabled on correct priorityPriority 3 enabled
DCBETS bandwidth allocationSMB Direct ≥ 50%
VLANAll VLANs accessiblePing succeeds
DNSName resolution worksAll targets resolve
NTPTime synchronizedStratum ≤ 4, all nodes sync
AzureEndpoint connectivityAll endpoints reachable on 443

Common Issues

RDMA Not Operational

# Re-enable RDMA on adapter
Enable-NetAdapterRdma -Name "Storage1"

# Verify RDMA is operational
Get-NetAdapterRdma -Name "Storage1"

DCB Misconfiguration

# Reset DCB to defaults and reconfigure
# WARNING: May disrupt storage traffic

# Remove existing policies
Get-NetQosPolicy | Remove-NetQosPolicy -Confirm:$false

# Recreate SMB Direct policy
New-NetQosPolicy -Name "SMB Direct" -NetDirectPortMatchCondition 445 -PriorityValue8021Action 3
Enable-NetQosFlowControl -Priority 3
Disable-NetQosFlowControl -Priority 0,1,2,4,5,6,7

Time Skew Between Nodes

# Force time sync
w32tm /resync /force

# Verify sync source
w32tm /query /source

Troubleshooting

IssueCauseResolution
RDMA validation shows Not OperationalNetwork adapter driver missing RDMA support or disabledVerify driver: Get-NetAdapterRdma; enable: Enable-NetAdapterRdma -Name <adapter>; update NIC firmware if needed
DCB/PFC misconfiguration detectedQoS policies conflicting or missingReset DCB: Get-NetQosPolicy | Remove-NetQosPolicy -Confirm:$false; recreate SMB Direct policy with correct priority class
RDMA throughput below expected baselineIncorrect MTU or flow control settingsVerify jumbo frames: Get-NetAdapterAdvancedProperty -Name <adapter> -RegistryKeyword *JumboPacket; set MTU to 9014

Next Step

Proceed to Task 4: High Availability Testing once network validation is complete.



When to use: Use this option for manual step-by-step execution.

See procedure steps above for manual execution guidance.

Toolkit Reference

Scripts for this task are located in the azurelocal-toolkit repository under scripts/deploy/ in the appropriate task folder.


Alternatives

The procedures in this task use the scripted methods shown in the tabs above. Additional deployment methods including Azure CLI and Bash scripts are available in the azurelocal-toolkit repository under scripts/deploy/.

MethodDescription
Azure CLIPowerShell-based Azure CLI scripts for Azure resource operations
BashLinux/macOS compatible shell scripts for pipeline environments
PreviousUpNext
← Task 2: VMFleet Storage TestingTesting & ValidationTask 4: High Availability Testing →

Version Control

VersionDateAuthorChanges
1.0.02026-03-24Azure Local CloudInitial release