Data Flow¶
This page traces the three primary data flows through the framework: configuration, results, and monitoring. Understanding these paths helps when diagnosing failures, extending the framework, or integrating with external systems.
Configuration Data Flow¶
Key Rules¶
- Downstream scripts never read
variables.ymldirectly. They only consume the generated JSON files. - Variables are tagged by solution name in the master YAML (
solutions: [fio, iperf]).ConfigManageremits only variables tagged for the target solution. - Override chain (lowest wins): master YAML → environment variable →
-Variablesparameter → profile YAML
Results Data Flow¶
Aggregate JSON Contract¶
Every tool's Collect-*.ps1 writes a JSON file conforming to the same top-level envelope:
{
"run_id": "string",
"tool": "string",
"profile": "string",
"node_count": int,
"<tool_specific_metrics>": { ... },
"collected_at": "ISO 8601 UTC"
}
Report templates rely on this envelope structure; adding a new tool requires a corresponding template that maps its specific metric fields.
Monitoring Data Flow¶
Alert Rule Evaluation¶
Alert rules are defined in monitoring/<tool>/alert-rules.yml. Each rule specifies:
| Field | Description |
|---|---|
counter |
Windows Performance Counter path |
condition |
<, >, or == |
threshold |
Numeric value |
cooldown_seconds |
Minimum seconds between repeated alerts for the same rule |
severity |
warning or critical |
When a rule fires, MonitoringManager appends a structured JSON line to alerts-<node>.jsonl with the rule name, counter value, node name, and UTC timestamp. The Collect-*.ps1 scripts include a threshold violation review step that surfaces any alerts recorded during the run.
Correlation IDs¶
Every log line written by the Logger module includes a correlation_id field set to the RunId passed to Start-*.ps1. This allows correlating entries across:
monitor-<node>.jsonl— PerfMon samplesalerts-<node>.jsonl— Alert triggers<RunId>-aggregate.json— Parsed resultsstate/<RunId>.json— Checkpoint state
When investigating a failed or anomalous run, filter all log files by "correlation_id": "<RunId>" to reconstruct the full timeline.