iPerf3 — Monitoring¶
During iPerf3 test runs, host-side network and compute counters are collected and evaluated against the thresholds in tools/iperf/monitoring/alerts/alert-rules.yml.
Alert Rules Reference¶
| Rule Name | Counter | Threshold | Severity | Rationale |
|---|---|---|---|---|
iperf_low_throughput |
\Network Interface(*)\Bytes Total/sec |
< 875 MB/s (7 Gbps) | Warning | Below 70% of 10GbE capacity indicates a network path issue |
iperf_interface_saturation |
\Network Interface(*)\Current Bandwidth |
> 9.9 Gbps | Warning | Near 100% utilization causes TCP retransmits and head-of-line blocking |
iperf_high_retransmits |
\TCPv4\Segments Retransmitted/sec |
> 100/s | Warning | Frequent retransmits indicate network quality or congestion issues |
iperf_high_cpu |
\Processor(_Total)\% Processor Time |
> 80% | Warning | Results may be CPU-bound rather than network-bound |
iperf_interrupt_load |
\Processor(_Total)\% Interrupt Time |
> 30% | Warning | High NIC interrupt rate — RSS/interrupt affinity should be tuned |
iperf_low_memory |
\Memory\Available MBytes |
< 512 MB | Warning | Memory pressure affects TCP socket buffer allocation |
Understanding iPerf3 Alerts¶
Throughput Alerts¶
iperf_low_throughput (< 875 MB/s) fires when the actual bytes transferred per second falls below the equivalent of 7 Gbps on a 10GbE interface. This is a meaningful threshold because iPerf3 itself adds some overhead — on a healthy 10GbE link, you should see at least 9.2–9.5 Gbps (1,150+ MB/s). Values near 7 Gbps suggest:
- Duplex mismatch or auto-negotiation failure (check
Get-NetAdapter) - Faulty cable or SFP transceiver
- Bandwidth policy or QoS policy throttling
iperf_interface_saturation (> 9.9 Gbps) fires when a single NIC approaches its rated maximum. For the TCP throughput test, this is expected behavior (you want to see the full 10 Gbps utilized). The alert is informational — it becomes a problem only if you see retransmits alongside saturation, indicating the link cannot absorb the offered load.
Retransmit Alert¶
iperf_high_retransmits (> 100/s) during an iPerf3 TCP test is a strong signal of network quality issues. TCP retransmits during a controlled loopback-scope test (within the cluster) should be near zero on a healthy fabric. Common causes:
- Congestion at a switch port (buffer overflow)
- Mismatched MTU / jumbo frame misconfiguration (
Test-NetConnectionwith large datagrams) - RDMA/RoCE priority flow control not configured
CPU and Interrupt Alerts¶
iperf_high_cpu (> 80%) signals that the test is measuring CPU-to-NIC throughput rather than raw network bandwidth. At high loads on 25GbE/100GbE, a single core can become the bottleneck for network interrupt processing.
iperf_interrupt_load (> 30%) specifically captures NIC interrupt affinity issues. If interrupt time is high but overall CPU is moderate, use Receive Side Scaling (RSS) to distribute NIC interrupts across cores:
# Check and configure RSS on all adapters
Get-NetAdapterRss | Select-Object Name, Enabled, NumberOfReceiveQueues
Set-NetAdapterRss -Name "Storage-NIC" -NumberOfReceiveQueues 8
Monitoring During a Run¶
# View alerts from a completed run log
Get-Content "logs\iperf\<RunId>\iperf-test.log.jsonl" |
ConvertFrom-Json |
Where-Object { $_.Severity -in @('WARNING', 'CRITICAL') } |
Select-Object Timestamp, Severity, Message
Live Counter Monitoring¶
# Monitor network interface counters on a node during the test
Get-Counter `
'\Network Interface(*)\Bytes Received/sec',
'\Network Interface(*)\Bytes Sent/sec',
'\TCPv4\Segments Retransmitted/sec',
'\Processor(_Total)\% Interrupt Time' `
-ComputerName "hci01-node1" `
-SampleInterval 5 -MaxSamples 12 |
ForEach-Object { $_.CounterSamples } |
Select-Object Path, CookedValue |
Format-Table -AutoSize