stress-ng — Troubleshooting¶
stress-ng Not Found on Target Node¶
Symptom: Start-StressNgTest.ps1 returns bash: stress-ng: command not found from one or more nodes.
Resolution:
# Ubuntu
sudo apt-get install -y stress-ng
# Rocky Linux / RHEL
sudo dnf install -y epel-release && sudo dnf install -y stress-ng
See Installation for the version verification procedure across all nodes.
OOM Killer Terminates Workers Mid-Run¶
Symptom: The run completes but the per-node YAML contains fewer stressor entries than expected, or Collect-StressNgResults.ps1 reports a node_count mismatch.
Diagnostic:
If stress-ng appears in the output, the OOM killer fired.
Resolution:
Then either reduce workers in memory-stress.yml (e.g. from 4 → 2) or add temporary swap:
Bogo-ops Count Is Zero for One Stressor¶
Symptom: The aggregate JSON shows total_bogo_ops: 0 for a specific stressor (e.g., hdd).
Cause: The hdd stressor writes temporary files to /tmp. If that filesystem is full, every write returns an error and no operations complete.
Diagnostic:
Resolution:
# Clean up residual test files from previous runs
ssh azurelocaladmin@hci01-node1 "rm -rf /tmp/stress-ng-results/ && df -h /tmp"
If /tmp is a separate tmpfs, increase its size in /etc/fstab:
CPU Throttling — Low Bogo-ops Compared to Baseline¶
Symptom: avg_bogo_ops_per_sec for the cpu stressor is 20%+ below a previous baseline run on identical hardware.
Investigation:
# Check CPU frequency counter on the affected node
Get-Counter "\Processor Information(_Total)\% Processor Frequency" `
-ComputerName "hci01-node1" -SampleInterval 5 -MaxSamples 6 |
ForEach-Object { $_.CounterSamples | Select-Object CookedValue }
If the frequency is below 80% of rated, the stressng_cpu_throttling alert will have fired. Review:
- BIOS power profile (set to Maximum Performance, not Balanced)
- Chassis airflow / fan curves
turbostaton the Linux node:sudo turbostat --quiet --show CPU,Bzy_MHz,Avg_MHz --interval 5
YAML Parse Failure — PowerShell Key Access Error¶
Symptom: Collect-StressNgResults.ps1 throws The property 'stress-ng' cannot be found on this object.
Cause: PowerShell cannot access a property with a hyphen using dot notation.
Resolution: Use single-quote notation for the hyphenated key:
This is documented in Reporting. If you customised the collection script and introduced dot notation, revert to quoted access.
SSH Connectivity Issues¶
Symptom: Start-StressNgTest.ps1 fails with Permission denied (publickey).
See the Operations Troubleshooting Guide for full SSH key resolution steps. Quick check:
ssh -i "$env:USERPROFILE\.ssh\azurelocal_rsa" -o BatchMode=yes `
"azurelocaladmin@hci01-node1" "echo ok"
If this returns ok, the key is correct. If it hangs or returns Permission denied, the key has not been distributed to that node — re-run ssh-copy-id.
Inconsistent Bogo-ops Across Nodes (Same Hardware)¶
Symptom: Node bogo-ops/sec values differ by more than 10% across nodes with identical CPU and RAM.
Common Causes:
| Cause | Diagnostic | Fix |
|---|---|---|
| Background process (backup, AV scan) | Get-Process -ComputerName node sorted by CPU |
Schedule maintenance windows outside test runs |
| NUMA imbalance | numactl --hardware on each node |
Pin workers to specific NUMA nodes: stress-ng --cpu 16 --taskset 0-15 |
| Different BIOS power profiles per node | powercfg /query |
Set consistent BIOS/hypervisor policy across all nodes |
| VM density difference | One node running extra VMs | Drain excess VMs before running stress-ng tests |