Skip to main content
Version: 1.0.0

Task 04: Setup Alerting

Runbook Azure

DOCUMENT CATEGORY: Runbook
SCOPE: Azure alerting configuration
PURPOSE: Configure proactive alerts for cluster health, performance, and hardware issues
MASTER REFERENCE: Microsoft Learn - Azure Alerts

Status: Active

Azure Monitor alerts provide proactive notification when cluster health degrades, performance thresholds are exceeded, or critical events occur. This step configures action groups for notifications and alert rules for common Azure Local scenarios.

Prerequisites

RequirementDescriptionValidation
Log Analytics WorkspaceData collection activeQueries return data
HCI InsightsEnabled (Step 3)Workbook shows data
Notification TargetsEmail addresses, Teams webhooks, etc.Contact list prepared
RBAC PermissionsMonitoring ContributorRole assignment verified

Variables from variables.yml

VariableConfig PathExample
AZURE_SUBSCRIPTION_IDazure.subscription.id00000000-0000-0000-0000-000000000000
AZURE_SUBSCRIPTION_NAMEazure.subscription.nameAzure Local Production
AZURE_RESOURCE_GROUPazure.resource_group.namerg-azurelocal-prod-eus2
SITE_CODEsite.codeDAL
NOC_EMAILalerting.noc_emailnoc@contoso.com
CUSTOMER_EMAILalerting.customer_emailops@customer.com
TEAMS_WEBHOOK_URLalerting.teams_webhook_urlhttps://outlook.office.com/webhook/...

Action Groups

Action groups define who and how to notify when alerts fire. Create these before creating alert rules.

  1. Navigate to Azure MonitorAlertsAction groups
  2. Click + Create
  3. Configure Basics:
SettingValue
Subscription{{AZURE_SUBSCRIPTION_NAME}}
Resource Group{{AZURE_RESOURCE_GROUP}}
Action group nameag-azl-{{SITE_CODE}}-critical
Display nameAZL Critical
  1. Configure Notifications:
Notification TypeNameTarget
Email/SMS/Push/VoiceAzure Local Cloud NOC{{NOC_EMAIL}}
Email/SMS/Push/VoiceCustomer Contact{{CUSTOMER_EMAIL}}
  1. Configure Actions (optional):
Action TypeNameConfiguration
WebhookTeams Channel{{TEAMS_WEBHOOK_URL}}
Azure FunctionAuto-RemediationFunction App URL
  1. Click Review + createCreate

Cluster Health Alerts

Create alerts based on HCI health events and metrics:

Alert 1: Node Health Critical

// Alert when a cluster node reports unhealthy
Event
| where Source == "Microsoft-Windows-SDDC-Management"
| where EventID == 3000
| extend HealthData = parse_json(RenderedDescription)
| where HealthData.HealthState != "Healthy"
| project TimeGenerated, Computer, HealthState = tostring(HealthData.HealthState)
SettingValue
Alert Rule NameNode Health - Critical
SeveritySev 1 - Critical
Evaluation FrequencyEvery 5 minutes
Lookback Period5 minutes
ThresholdGreater than 0
Action Groupag-azl-{{SITE_CODE}}-critical

Alert 2: Storage Volume Unhealthy

// Alert on storage volume health issues
Event
| where Source == "Microsoft-Windows-SDDC-Management"
| where EventID == 3002
| extend VolumeData = parse_json(RenderedDescription)
| where VolumeData.HealthStatus != "Healthy"
| project TimeGenerated, VolumeName = tostring(VolumeData.Name),
HealthStatus = tostring(VolumeData.HealthStatus)

Alert 3: High CPU Utilization

// Alert when CPU exceeds threshold
Perf
| where ObjectName == "Processor" and CounterName == "% Processor Time"
| where InstanceName == "_Total"
| summarize AvgCPU = avg(CounterValue) by Computer, bin(TimeGenerated, 5m)
| where AvgCPU > 90
SettingValue
Alert Rule NameHigh CPU Utilization
SeveritySev 2 - Warning
ThresholdCPU > 90% for 15 minutes

Alert 4: Low Available Memory

// Alert when available memory drops below threshold
Perf
| where ObjectName == "Memory" and CounterName == "Available Bytes"
| extend AvailableGB = CounterValue / 1073741824
| summarize AvgAvailableGB = avg(AvailableGB) by Computer, bin(TimeGenerated, 5m)
| where AvgAvailableGB < 16 // Alert if less than 16 GB available

Create Alert Rule (Portal)

  1. Navigate to Azure MonitorAlertsAlert rules
  2. Click + Create
  3. Select Scope: Your Log Analytics workspace
  4. Configure Condition:
  • Signal type: Custom log search
  • Paste the KQL query
  • Set threshold logic
  1. Configure Actions: Select action group
  2. Configure Details:
  • Alert rule name
  • Severity
  • Resource group
  1. Click Review + createCreate

PowerShell: Create Log Alert Rule

# Create a scheduled query rule for node health
$workspace = Get-AzOperationalInsightsWorkspace `
-ResourceGroupName "{{AZURE_RESOURCE_GROUP}}" `
-Name "{{LOG_ANALYTICS_WORKSPACE_NAME}}"

$actionGroup = Get-AzActionGroup `
-ResourceGroupName "{{AZURE_RESOURCE_GROUP}}" `
-Name "ag-azl-{{SITE_CODE}}-critical"

$query = @"
Event
| where Source == "Microsoft-Windows-SDDC-Management"
| where EventID == 3000
| extend HealthData = parse_json(RenderedDescription)
| where HealthData.HealthState != "Healthy"
| summarize Count = count() by Computer
"@

# Note: Use New-AzScheduledQueryRule for full implementation
Write-Host "Alert rule configuration prepared" -ForegroundColor Cyan
Write-Host "Query: $query"
Write-Host "Action Group: $($actionGroup.Id)"
Alert NameSeverityConditionAction
Node Health CriticalSev 1Node unhealthy eventEmail + Teams
Storage Volume UnhealthySev 1Volume status != HealthyEmail + Teams
High CPU UtilizationSev 2CPU > 90% for 15 minEmail
Low Available MemorySev 2Memory < 16 GBEmail
VM FailedSev 2VM state = FailedEmail
Storage Latency HighSev 3Write latency > 100msEmail
Arc Connectivity LostSev 1No heartbeat > 30 minEmail + Teams

Validation

Test Alert Rule

  1. Navigate to Azure MonitorAlertsAlert rules
  2. Select your alert rule
  3. Click Test action group to verify notifications work

Check Alert History

# Get recent alerts
Get-AzAlert -ResourceGroupName "{{AZURE_RESOURCE_GROUP}}" `
-TimeRange "P1D" | # Last 24 hours
Select-Object Name, Severity, MonitorCondition, ResolvedTime |
Format-Table -AutoSize

Verify Action Group

// Check for alert notifications in Activity Log
AzureActivity
| where CategoryValue == "Alert"
| where TimeGenerated > ago(24h)
| project TimeGenerated, OperationName, Status, Description
| order by TimeGenerated desc

Troubleshooting

IssuePossible CauseResolution
No alerts firingQuery returns no resultsTest query in Log Analytics
Email not receivedEmail in spamCheck spam folder, add to safe senders
Action group errorInvalid webhook URLVerify Teams/webhook URL format
Alert fires repeatedlyThreshold too sensitiveAdjust threshold or aggregation period

Variables Reference

VariableDescriptionExample
{{NOC_EMAIL}}Azure Local Cloud NOC emailnoc@azurelocal.cloud
{{CUSTOMER_EMAIL}}Customer contactadmin@customer.com
{{TEAMS_WEBHOOK_URL}}Teams channel webhookhttps://outlook.office.com/webhook/...

Next Steps

After setting up alerting:

  1. ➡️ Task 5: Deploy OMIMSWAC Monitoring — Hardware monitoring
  2. Review alert rules weekly and adjust thresholds as needed
  3. Create runbooks for common alert responses
  4. Document escalation procedures for critical alerts

Toolkit Reference

Scripts for this task are located in the azurelocal-toolkit repository under scripts/deploy/ in the appropriate task folder.


Alternatives

The procedures in this task use the scripted methods shown in the tabs above. Additional deployment methods including Azure CLI and Bash scripts are available in the azurelocal-toolkit repository under scripts/deploy/.

MethodDescription
Azure CLIPowerShell-based Azure CLI scripts for Azure resource operations
BashLinux/macOS compatible shell scripts for pipeline environments
PreviousUpNext
← Task 03: HCI InsightsPhase 02: Monitoring & ObservabilityPhase 04: Security & Governance →

VersionDateAuthorChanges
1.0.02026-03-24Azure Local CloudInitial release

Version Control

VersionDateAuthorChanges
1.0.02025-03-25Azure Local CloudInitial release