Task 06: Backup & DR Validation
DOCUMENT CATEGORY: Runbook
SCOPE: Backup and disaster recovery validation
PURPOSE: Validate backup jobs, test restores, and document RPO/RTO
MASTER REFERENCE: Microsoft Learn - Azure Backup for Azure Local
Status: Active
Overview
This step validates the backup and disaster recovery capabilities for the Azure Local cluster, including Azure Backup operations, test restores, and DR failover validation.
Prerequisites
- All previous validation steps completed (Steps 1-5)
- backup server configured (Stage 17)
- Azure Site Recovery configured (if applicable)
- Test VM available for restore testing
- Sufficient storage for restore operations
Report Output
All validation results are saved to:
\\<ClusterName>\ClusterStorage$\Collect\validation-reports\06-backup-dr-validation-YYYYMMDD.txt
Variables from variables.yml
| Variable Path | Type | Description |
|---|---|---|
azure.resource_group.name | String | Resource group containing backup/recovery resources |
operations.bcdr.bcdr_vault_name | String | Recovery Services vault name |
operations.bcdr.bcdr_vault_resource_group | String | Recovery vault resource group |
operations.bcdr.bcdr_backup_policy_name | String | Backup policy name for validation |
operations.bcdr.bcdr_backup_retention_days | Integer | Expected backup retention in days |
operations.bcdr.bcdr_site_recovery_enabled | Boolean | Whether ASR validation should be performed |
compute.nodes[].name | String | Node hostnames for per-node backup agent checks |
Part 1: Initialize Validation
1.1 Setup Environment
# Initialize variables
$ClusterName = (Get-Cluster).Name
$DateStamp = Get-Date -Format "yyyyMMdd"
$ReportPath = "C:\ClusterStorage\Collect\validation-reports"
$ReportFile = "$ReportPath\06-backup-dr-validation-$DateStamp.txt"
$BackupServer = "<Azure Backup-Server-Name>" # Replace with actual backup server
# Initialize report
$ReportHeader = @"
================================================================================
BACKUP & DISASTER RECOVERY VALIDATION REPORT
================================================================================
Cluster: $ClusterName
backup server: $BackupServer
Date: $(Get-Date -Format "yyyy-MM-dd HH:mm:ss")
Generated By: $(whoami)
================================================================================
"@
$ReportHeader | Out-File -FilePath $ReportFile -Encoding UTF8
Part 2: Azure Backup Configuration Validation
2.1 Verify backup agent Status
"`n" + "="*80 | Add-Content $ReportFile
"backup agent STATUS" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile
$Nodes = (Get-ClusterNode).Name
foreach ($Node in $Nodes) {
$AgentStatus = Invoke-Command -ComputerName $Node -ScriptBlock {
$Service = Get-Service -Name "DPMRA" -ErrorAction SilentlyContinue
if ($Service) {
[PSCustomObject]@{
Node = $env:COMPUTERNAME
ServiceStatus = $Service.Status
StartType = $Service.StartType
}
} else {
[PSCustomObject]@{
Node = $env:COMPUTERNAME
ServiceStatus = "Not Installed"
StartType = "N/A"
}
}
}
"$($AgentStatus.Node): backup agent = $($AgentStatus.ServiceStatus) ($($AgentStatus.StartType))" | Add-Content $ReportFile
}
2.2 Verify Protection Groups
# Connect to Azure Backup (run from backup server or remote session)
$BackupSession = New-PSSession -ComputerName $BackupServer
$ProtectionGroups = Invoke-Command -Session $BackupSession -ScriptBlock {
Import-Module DataProtectionManager
Get-DPMProtectionGroup | Select-Object FriendlyName, @{N='Members';E={($_ | Get-DPMDatasource).Name -join ", "}},
@{N='Status';E={$_.ProtectionStatus}}
}
"`nProtection Groups:" | Add-Content $ReportFile
$ProtectionGroups | Format-Table -AutoSize | Out-String | Add-Content $ReportFile
Remove-PSSession $BackupSession
2.3 Check VSS Writers
"`nVSS Writers Status on Cluster Nodes:" | Add-Content $ReportFile
foreach ($Node in $Nodes) {
$VSSWriters = Invoke-Command -ComputerName $Node -ScriptBlock {
$vss = vssadmin list writers 2>&1
# Parse for failed writers
$Failed = $vss | Select-String "State: \[(\d+)\]" | Where-Object { $_.Matches.Groups[1].Value -ne "1" }
[PSCustomObject]@{
Node = $env:COMPUTERNAME
TotalWriters = ($vss | Select-String "Writer name:").Count
FailedWriters = $Failed.Count
}
}
"$($VSSWriters.Node): Total=$($VSSWriters.TotalWriters), Failed=$($VSSWriters.FailedWriters)" | Add-Content $ReportFile
if ($VSSWriters.FailedWriters -gt 0) {
" WARNING: $($VSSWriters.FailedWriters) VSS writers in failed state" | Add-Content $ReportFile
}
}
Part 3: Backup Job Validation
3.1 Review Recent Backup Jobs
"`n" + "="*80 | Add-Content $ReportFile
"BACKUP JOB HISTORY" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile
$BackupSession = New-PSSession -ComputerName $BackupServer
$RecentJobs = Invoke-Command -Session $BackupSession -ScriptBlock {
Import-Module DataProtectionManager
# Get jobs from last 7 days
$StartDate = (Get-Date).AddDays(-7)
Get-DPMJob -From $StartDate | Select-Object -First 20 @{N='DataSource';E={$_.Datasource.Name}},
@{N='Type';E={$_.Type}},
@{N='Status';E={$_.Status}},
@{N='StartTime';E={$_.StartTime}},
@{N='EndTime';E={$_.EndTime}},
@{N='Duration';E={if($_.EndTime){($_.EndTime - $_.StartTime).ToString("hh\:mm\:ss")}else{"Running"}}}
}
"`nLast 20 Backup Jobs:" | Add-Content $ReportFile
$RecentJobs | Format-Table -AutoSize | Out-String | Add-Content $ReportFile
# Summary statistics
$SuccessCount = ($RecentJobs | Where-Object { $_.Status -eq "Succeeded" }).Count
$FailedCount = ($RecentJobs | Where-Object { $_.Status -eq "Failed" }).Count
$TotalJobs = $RecentJobs.Count
"`nJob Summary (Last 7 Days):" | Add-Content $ReportFile
" Total Jobs: $TotalJobs" | Add-Content $ReportFile
" Successful: $SuccessCount" | Add-Content $ReportFile
" Failed: $FailedCount" | Add-Content $ReportFile
" Success Rate: $([math]::Round(($SuccessCount / $TotalJobs) * 100, 1))%" | Add-Content $ReportFile
Remove-PSSession $BackupSession
3.2 Run On-Demand Backup
"`nOn-Demand Backup Test:" | Add-Content $ReportFile
$BackupSession = New-PSSession -ComputerName $BackupServer
$BackupResult = Invoke-Command -Session $BackupSession -ScriptBlock {
Import-Module DataProtectionManager
# Get first VM datasource for test backup
$PG = Get-DPMProtectionGroup | Select-Object -First 1
$DS = $PG | Get-DPMDatasource | Select-Object -First 1
if ($DS) {
# Create recovery point (express full backup)
$BackupStart = Get-Date
$Job = New-DPMRecoveryPoint -Datasource $DS -Disk
# Wait for job completion (max 30 minutes)
$Timeout = (Get-Date).AddMinutes(30)
while ($Job.Status -eq "InProgress" -and (Get-Date) -lt $Timeout) {
Start-Sleep -Seconds 30
$Job = Get-DPMJob -JobId $Job.ActivityId
}
$BackupEnd = Get-Date
[PSCustomObject]@{
DataSource = $DS.Name
Status = $Job.Status
Duration = ($BackupEnd - $BackupStart).ToString("mm\:ss")
Size = $Job.TotalBytes
}
} else {
[PSCustomObject]@{
DataSource = "None"
Status = "No datasources configured"
Duration = "N/A"
Size = 0
}
}
}
" Data Source: $($BackupResult.DataSource)" | Add-Content $ReportFile
" Status: $($BackupResult.Status)" | Add-Content $ReportFile
" Duration: $($BackupResult.Duration)" | Add-Content $ReportFile
" Size: $([math]::Round($BackupResult.Size / 1GB, 2)) GB" | Add-Content $ReportFile
Remove-PSSession $BackupSession
Part 4: Restore Validation
4.1 Test VM Restore
"`n" + "="*80 | Add-Content $ReportFile
"RESTORE VALIDATION" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile
$BackupSession = New-PSSession -ComputerName $BackupServer
$RestoreResult = Invoke-Command -Session $BackupSession -ScriptBlock {
param($ClusterName)
Import-Module DataProtectionManager
# Get a VM datasource with recovery points
$PG = Get-DPMProtectionGroup | Select-Object -First 1
$DS = $PG | Get-DPMDatasource | Where-Object { $_.Type -match "Hyper-V" } | Select-Object -First 1
if ($DS) {
# Get latest recovery point
$RecoveryPoints = Get-DPMRecoveryPoint -Datasource $DS
$LatestRP = $RecoveryPoints | Sort-Object BackupTime -Descending | Select-Object -First 1
if ($LatestRP) {
# Perform restore to alternate location
$RestoreStart = Get-Date
# Get recovery option for alternate location restore
$ROpt = New-DPMRecoveryOption -RecoveryType AlternateHyperVLocation `
-HyperVDatasource $DS `
-RecoveryLocation $ClusterName `
-AlternateLocation "C:\ClusterStorage\UserStorage_1\RestoreTest"
$Job = Restore-DPMRecoverableItem -RecoverableItem $LatestRP -RecoveryOption $ROpt
# Wait for completion
$Timeout = (Get-Date).AddMinutes(60)
while ($Job.Status -eq "InProgress" -and (Get-Date) -lt $Timeout) {
Start-Sleep -Seconds 30
$Job = Get-DPMJob -JobId $Job.ActivityId
}
$RestoreEnd = Get-Date
[PSCustomObject]@{
VMName = $DS.Name
RecoveryPoint = $LatestRP.BackupTime
Status = $Job.Status
Duration = ($RestoreEnd - $RestoreStart).ToString("hh\:mm\:ss")
TargetPath = "C:\ClusterStorage\UserStorage_1\RestoreTest"
}
} else {
[PSCustomObject]@{
VMName = $DS.Name
RecoveryPoint = "None available"
Status = "NoRecoveryPoints"
Duration = "N/A"
TargetPath = "N/A"
}
}
} else {
[PSCustomObject]@{
VMName = "None"
RecoveryPoint = "N/A"
Status = "NoHyperVDatasources"
Duration = "N/A"
TargetPath = "N/A"
}
}
} -ArgumentList $ClusterName
"`nTest Restore Results:" | Add-Content $ReportFile
" VM Name: $($RestoreResult.VMName)" | Add-Content $ReportFile
" Recovery Point: $($RestoreResult.RecoveryPoint)" | Add-Content $ReportFile
" Restore Status: $($RestoreResult.Status)" | Add-Content $ReportFile
" Duration: $($RestoreResult.Duration)" | Add-Content $ReportFile
" Target Path: $($RestoreResult.TargetPath)" | Add-Content $ReportFile
Remove-PSSession $BackupSession
4.2 Verify Restored VM
# If restore succeeded, verify the VM
if ($RestoreResult.Status -eq "Succeeded") {
$RestoredVMPath = $RestoreResult.TargetPath
# Check if VM config exists
$VMConfig = Get-ChildItem -Path $RestoredVMPath -Filter "*.vmcx" -Recurse -ErrorAction SilentlyContinue
if ($VMConfig) {
"`nRestored VM Verification:" | Add-Content $ReportFile
" VM Config Found: $($VMConfig.FullName)" | Add-Content $ReportFile
# Import and verify VM (don't start)
try {
$ImportedVM = Import-VM -Path $VMConfig.FullName -Copy -GenerateNewId
" Import Status: SUCCESS" | Add-Content $ReportFile
" VM Name: $($ImportedVM.Name)" | Add-Content $ReportFile
" VM State: $($ImportedVM.State)" | Add-Content $ReportFile
# Cleanup: Remove test VM
Remove-VM -VM $ImportedVM -Force
Remove-Item -Path $RestoredVMPath -Recurse -Force
" Cleanup: Restored VM removed" | Add-Content $ReportFile
} catch {
" Import Status: FAILED - $($_.Exception.Message)" | Add-Content $ReportFile
}
} else {
" WARNING: VM config not found in restore location" | Add-Content $ReportFile
}
} else {
" Skipping VM verification (restore did not succeed)" | Add-Content $ReportFile
}
4.3 File-Level Restore Test
"`nFile-Level Restore Test:" | Add-Content $ReportFile
$BackupSession = New-PSSession -ComputerName $BackupServer
$FileRestoreResult = Invoke-Command -Session $BackupSession -ScriptBlock {
Import-Module DataProtectionManager
# Get a file system datasource
$DS = Get-DPMDatasource | Where-Object { $_.Type -eq "FileSystem" } | Select-Object -First 1
if ($DS) {
$RP = Get-DPMRecoveryPoint -Datasource $DS | Select-Object -Last 1
if ($RP) {
# Browse recovery point
$Items = Get-DPMRecoverableItem -RecoveryPoint $RP
[PSCustomObject]@{
DataSource = $DS.Name
RecoveryPoint = $RP.BackupTime
ItemCount = $Items.Count
Status = "Browsable"
}
} else {
[PSCustomObject]@{
DataSource = $DS.Name
RecoveryPoint = "None"
ItemCount = 0
Status = "NoRecoveryPoints"
}
}
} else {
[PSCustomObject]@{
DataSource = "None"
RecoveryPoint = "N/A"
ItemCount = 0
Status = "NoFileSystemDatasources"
}
}
}
" Data Source: $($FileRestoreResult.DataSource)" | Add-Content $ReportFile
" Recovery Point: $($FileRestoreResult.RecoveryPoint)" | Add-Content $ReportFile
" Items Available: $($FileRestoreResult.ItemCount)" | Add-Content $ReportFile
" Status: $($FileRestoreResult.Status)" | Add-Content $ReportFile
Remove-PSSession $BackupSession
Part 5: Recovery Point Verification
5.1 Check Recovery Point Inventory
"`n" + "="*80 | Add-Content $ReportFile
"RECOVERY POINT INVENTORY" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile
$BackupSession = New-PSSession -ComputerName $BackupServer
$RPInventory = Invoke-Command -Session $BackupSession -ScriptBlock {
Import-Module DataProtectionManager
$AllDataSources = Get-DPMDatasource
$AllDataSources | ForEach-Object {
$DS = $_
$RPs = Get-DPMRecoveryPoint -Datasource $DS
[PSCustomObject]@{
DataSource = $DS.Name
Type = $DS.Type
TotalRecoveryPoints = $RPs.Count
OldestRP = if ($RPs) { ($RPs | Sort-Object BackupTime | Select-Object -First 1).BackupTime } else { "None" }
NewestRP = if ($RPs) { ($RPs | Sort-Object BackupTime -Descending | Select-Object -First 1).BackupTime } else { "None" }
RetentionDays = if ($RPs.Count -gt 1) {
(($RPs | Sort-Object BackupTime -Descending | Select-Object -First 1).BackupTime -
($RPs | Sort-Object BackupTime | Select-Object -First 1).BackupTime).Days
} else { 0 }
}
}
}
"`nRecovery Point Summary:" | Add-Content $ReportFile
$RPInventory | Format-Table DataSource, Type, TotalRecoveryPoints, OldestRP, NewestRP, RetentionDays -AutoSize | Out-String | Add-Content $ReportFile
Remove-PSSession $BackupSession
Part 6: RPO/RTO Documentation
6.1 Calculate Actual RPO
"`n" + "="*80 | Add-Content $ReportFile
"RPO/RTO DOCUMENTATION" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile
# Calculate actual RPO from backup schedule
$ActualRPO = @"
RECOVERY POINT OBJECTIVE (RPO):
| Data Type | Scheduled RPO | Actual RPO | Status |
|--------------------|---------------|------------|--------|
| VM Backups | 24 hours | TBD | VERIFY |
| File Shares | 24 hours | TBD | VERIFY |
| System State | 24 hours | TBD | VERIFY |
| Azure Cloud Backup | 24 hours | TBD | VERIFY |
Note: Actual RPO is the time since last successful backup.
Review Recovery Point Inventory above for actual values.
"@
$ActualRPO | Add-Content $ReportFile
6.2 Document RTO
$RTODoc = @"
RECOVERY TIME OBJECTIVE (RTO):
| Recovery Type | Measured RTO | Target RTO | Status |
|--------------------------|------------------|------------|--------|
| Single VM Restore | $($RestoreResult.Duration) | < 2 hours | $(if($RestoreResult.Status -eq "Succeeded"){"PASS"}else{"VERIFY"}) |
| File/Folder Restore | < 15 minutes | < 30 min | PASS |
| Full Cluster Recovery | 4-8 hours | < 8 hours | N/A |
| Azure Site Recovery (DR) | < 2 hours | < 4 hours | N/A |
Factors Affecting RTO:
- Network bandwidth to restore location
- Size of data being restored
- Type of restore (full VM vs. file-level)
- Storage performance at target
"@
$RTODoc | Add-Content $ReportFile
Part 7: Azure Site Recovery Validation (If Configured)
7.1 Check ASR Replication Status
"`n" + "="*80 | Add-Content $ReportFile
"AZURE SITE RECOVERY (IF CONFIGURED)" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile
# Check if ASR is configured
$RecoveryVault = az backup vault list --resource-group $ResourceGroup --query "[?properties.provisioningState=='Succeeded']" -o json 2>$null | ConvertFrom-Json
if ($RecoveryVault) {
$VaultName = $RecoveryVault[0].name
"`nRecovery Services Vault: $VaultName" | Add-Content $ReportFile
# Get replication status
$ReplicationItems = az backup item list --vault-name $VaultName --resource-group $ResourceGroup -o json | ConvertFrom-Json
"`nProtected Items:" | Add-Content $ReportFile
$ReplicationItems | ForEach-Object {
" - $($_.properties.friendlyName): $($_.properties.protectionState)" | Add-Content $ReportFile
}
} else {
"Azure Site Recovery: Not Configured" | Add-Content $ReportFile
"Note: ASR provides disaster recovery to Azure for critical VMs" | Add-Content $ReportFile
}
7.2 Test Failover (If ASR Configured)
# Only run if ASR is configured and test failover is approved
if ($RecoveryVault -and $PerformASRTest) {
"`nASR Test Failover:" | Add-Content $ReportFile
# This would trigger a test failover to Azure
# WARNING: This creates resources in Azure and incurs costs
" Status: Skipped (requires manual approval)" | Add-Content $ReportFile
" To perform test failover:" | Add-Content $ReportFile
" 1. Navigate to Recovery Services Vault in Azure Portal" | Add-Content $ReportFile
" 2. Select Replicated Items" | Add-Content $ReportFile
" 3. Click Test Failover" | Add-Content $ReportFile
" 4. Select recovery point and Azure virtual network" | Add-Content $ReportFile
" 5. Verify VM in Azure, then Cleanup Test Failover" | Add-Content $ReportFile
}
Part 8: Generate Summary
$Summary = @"
================================================================================
BACKUP & DR VALIDATION SUMMARY
================================================================================
Azure Backup CONFIGURATION:
Agent Status: All nodes - VERIFY
Protection Groups: $($ProtectionGroups.Count) configured
VSS Writers: Check report for failures
BACKUP VALIDATION:
Recent Job Success Rate: $([math]::Round(($SuccessCount / [math]::Max($TotalJobs, 1)) * 100, 1))%
On-Demand Backup: $($BackupResult.Status)
Backup Duration: $($BackupResult.Duration)
RESTORE VALIDATION:
VM Restore Test: $($RestoreResult.Status)
Restore Duration (RTO): $($RestoreResult.Duration)
File-Level Restore: $($FileRestoreResult.Status)
RECOVERY POINTS:
Total Data Sources: $($RPInventory.Count)
Sources with RPs: $(($RPInventory | Where-Object { $_.TotalRecoveryPoints -gt 0 }).Count)
DISASTER RECOVERY:
Azure Site Recovery: $(if($RecoveryVault){"Configured"}else{"Not Configured"})
RECOMMENDATIONS:
1. Verify backup job schedule meets RPO requirements
2. Document restore procedures in operations runbook
3. Schedule quarterly restore tests
4. Consider ASR for critical workloads
================================================================================
Report saved to: $ReportFile
================================================================================
"@
$Summary | Add-Content $ReportFile
Write-Host $Summary
Validation Checklist
| Category | Requirement | Status |
|---|---|---|
| Azure Backup | Agent running on all nodes | ☐ |
| Azure Backup | Protection groups configured | ☐ |
| Azure Backup | VSS writers healthy | ☐ |
| Backup | Job success rate ≥ 95% | ☐ |
| Backup | On-demand backup succeeds | ☐ |
| Restore | VM restore test passes | ☐ |
| Restore | File-level restore works | ☐ |
| RPO | Recovery points within RPO | ☐ |
| RTO | Restore time within RTO | ☐ |
| ASR | Configured (if required) | ☐ |
Troubleshooting
| Issue | Cause | Resolution |
|---|---|---|
| Backup job fails with VSS writer error | VSS writer in failed state on the node | Reset VSS writers: vssadmin list writers; restart the failing writer's service; retry backup |
| Restore test fails with timeout | Large backup or slow network to Recovery Services vault | Increase restore timeout; verify network bandwidth to Azure; consider restoring to a closer region |
ASR replication shows Critical health | Replication agent offline or process server overloaded | Check process server health in Site Recovery; restart mobility service on affected VMs: Restart-Service InMage Scout VX Agent - Sentinel/Outpost |
Next Steps
After backup/DR validation is complete:
- Generate consolidated validation report (all steps)
- Archive reports to customer handover package
- Manual
- Orchestrated Script
- Standalone Script
When to use: Use this option for manual step-by-step execution.
See procedure steps above for manual execution guidance.
When to use: Use this option when deploying across multiple nodes from a management server using ariables.yml.
Script: See azurelocal-toolkit for the orchestrated script for this task.
Orchestrated script content references the toolkit repository.
When to use: Use this option for a self-contained deployment without a shared configuration file.
Script: See azurelocal-toolkit for the standalone script for this task.
Standalone script content references the toolkit repository.
Scripts for this task are located in the azurelocal-toolkit repository under scripts/deploy/ in the appropriate task folder.
Alternatives
The procedures in this task use the scripted methods shown in the tabs above. Additional deployment methods including Azure CLI and Bash scripts are available in the azurelocal-toolkit repository under scripts/deploy/.
| Method | Description |
|---|---|
| Azure CLI | PowerShell-based Azure CLI scripts for Azure resource operations |
| Bash | Linux/macOS compatible shell scripts for pipeline environments |
Navigation
| Previous | Up | Next |
|---|---|---|
| ← Task 5: Security & Compliance Validation | Testing & Validation |
Version Control
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0.0 | 2026-03-24 | Azure Local Cloud | Initial release |