Skip to content

fio — Monitoring

Tool: fio Category: Tool Guide

During fio test runs, host-side performance counters are collected and evaluated against the thresholds in tools/fio/monitoring/alerts/alert-rules.yml. Alerts are emitted via the MonitoringManager module to the structured log and optionally to Azure Monitor.

Alert Rules Reference

Rule Name Counter Threshold Severity Rationale
fio_high_disk_wait \PhysicalDisk(*)\% Disk Time > 95% Warning Disk fully saturated; additional IOPS headroom is exhausted
fio_high_disk_latency \PhysicalDisk(*)\Avg. Disk sec/Transfer > 50ms Critical Latency this high indicates storage path issues, not just load
fio_high_disk_queue \PhysicalDisk(*)\Current Disk Queue Length > 64 Warning Queue depth beyond 64 signals storage bandwidth exhaustion
fio_high_cpu \Processor(_Total)\% Processor Time > 90% Warning High host CPU during fio may artificially depress IOPS results
fio_high_iowait \System\Processor Queue Length > 32 Warning Processor queue buildup indicates I/O → CPU coupling
fio_low_memory \Memory\Available MBytes < 512 MB Warning Memory pressure can affect I/O cache hit rates
fio_network_saturation \Network Interface(*)\Bytes Total/sec > 9 GB/s Warning Storage network saturation may indicate RDMA path issues

Understanding fio Alerts

Storage Alerts

fio_high_disk_latency (Critical) is raised when the OS-level average disk transfer latency exceeds 50ms. This threshold is intentionally lower than the fio-reported profile thresholds — if the OS latency is already this high, fio is measuring a degraded path, and results are not valid as a baseline.

fio_high_disk_queue is complementary to disk time: a disk can show 100% busy but still have low queue depth if it is processing requests quickly. A queue beyond 64 on a single disk indicates backpressure accumulating from the fio workload.

Compute Alerts

fio_high_cpu warns when CPU utilization exceeds 90% during a storage test. Because fio itself runs in userspace and processes I/O completions, high CPU can limit IOPS measurements on fast NVMe storage and SCM tiers. If this fires, consider reducing num_jobs or io_depth.

Memory Alert

fio_low_memory at the 512 MB threshold warns that the OS buffer cache may be under pressure. Linux I/O cache effects are intentional with libaio (O_DIRECT is set by default in the profiles, so this mainly protects the test environment).

Monitoring During a Run

Monitoring is started automatically by Start-FioTest.ps1 when the MonitoringManager module is available. To check alert output after the fact, review the structured log:

# Find alert entries in the run log
Get-Content "logs\fio\<RunId>\fio-test.log.jsonl" |
    ConvertFrom-Json |
    Where-Object { $_.Severity -in @('WARNING', 'CRITICAL') } |
    Select-Object Timestamp, Severity, Message

Alert Configuration File

tools/fio/monitoring/alerts/alert-rules.yml — alert definitions are consumed by MonitoringManager during test execution. To adjust a threshold, edit the threshold field for the relevant rule.

# Example: lower the disk latency critical threshold to 20ms
- name: fio_high_disk_latency
  threshold: 0.020    # 20ms
  severity: critical

Changes take effect on the next test run. No restart required.