Task 06: Backup & DR Validation
DOCUMENT CATEGORY: Runbook SCOPE: Backup and disaster recovery validation PURPOSE: Validate backup jobs, test restores, and document RPO/RTO MASTER REFERENCE: Microsoft Learn - Azure Backup for Azure Local
Status: Active
Overview
This step validates the backup and disaster recovery capabilities for the Azure Local cluster, including Azure Backup operations, test restores, and DR failover validation.
Prerequisites
- All previous validation steps completed (Steps 1-5)
- backup server configured (Stage 17)
- Azure Site Recovery configured (if applicable)
- Test VM available for restore testing
- Sufficient storage for restore operations
Report Output
All validation results are saved to:
\\<ClusterName>\ClusterStorage$\Collect\validation-reports\06-backup-dr-validation-YYYYMMDD.txt
Variables from variables.yml
| Variable Path | Type | Description |
|---|---|---|
azure.resource_group.name | String | Resource group containing backup/recovery resources |
operations.bcdr.bcdr_vault_name | String | Recovery Services vault name |
operations.bcdr.bcdr_vault_resource_group | String | Recovery vault resource group |
operations.bcdr.bcdr_backup_policy_name | String | Backup policy name for validation |
operations.bcdr.bcdr_backup_retention_days | Integer | Expected backup retention in days |
operations.bcdr.bcdr_site_recovery_enabled | Boolean | Whether ASR validation should be performed |
compute.nodes[].name | String | Node hostnames for per-node backup agent checks |
Part 1: Initialize Validation
1.1 Setup Environment
# Initialize variables
$ClusterName = (Get-Cluster).Name
$DateStamp = Get-Date -Format "yyyyMMdd"
$ReportPath = "C:\ClusterStorage\Collect\validation-reports"
$ReportFile = "$ReportPath\06-backup-dr-validation-$DateStamp.txt"
$BackupServer = "<Azure Backup-Server-Name>" # Replace with actual backup server
# Initialize report
$ReportHeader = @"
================================================================================
BACKUP & DISASTER RECOVERY VALIDATION REPORT
================================================================================
Cluster: $ClusterName
backup server: $BackupServer
Date: $(Get-Date -Format "yyyy-MM-dd HH:mm:ss")
Generated By: $(whoami)
================================================================================
"@
$ReportHeader | Out-File -FilePath $ReportFile -Encoding UTF8
Part 2: Azure Backup Configuration Validation
2.1 Verify backup agent Status
"`n" + "="*80 | Add-Content $ReportFile
"backup agent STATUS" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile
$Nodes = (Get-ClusterNode).Name
foreach ($Node in $Nodes) {
$AgentStatus = Invoke-Command -ComputerName $Node -ScriptBlock {
$Service = Get-Service -Name "DPMRA" -ErrorAction SilentlyContinue
if ($Service) {
[PSCustomObject]@{
Node = $env:COMPUTERNAME
ServiceStatus = $Service.Status
StartType = $Service.StartType
}
} else {
[PSCustomObject]@{
Node = $env:COMPUTERNAME
ServiceStatus = "Not Installed"
StartType = "N/A"
}
}
}
"$($AgentStatus.Node): backup agent = $($AgentStatus.ServiceStatus) ($($AgentStatus.StartType))" | Add-Content $ReportFile
}
2.2 Verify Protection Groups
# Connect to Azure Backup (run from backup server or remote session)
$BackupSession = New-PSSession -ComputerName $BackupServer
$ProtectionGroups = Invoke-Command -Session $BackupSession -ScriptBlock {
Import-Module DataProtectionManager
Get-DPMProtectionGroup | Select-Object FriendlyName, @{N='Members';E={($_ | Get-DPMDatasource).Name -join ", "}},
@{N='Status';E={$_.ProtectionStatus}}
}
"`nProtection Groups:" | Add-Content $ReportFile
$ProtectionGroups | Format-Table -AutoSize | Out-String | Add-Content $ReportFile
Remove-PSSession $BackupSession
2.3 Check VSS Writers
"`nVSS Writers Status on Cluster Nodes:" | Add-Content $ReportFile
foreach ($Node in $Nodes) {
$VSSWriters = Invoke-Command -ComputerName $Node -ScriptBlock {
$vss = vssadmin list writers 2>&1
# Parse for failed writers
$Failed = $vss | Select-String "State: \[(\d+)\]" | Where-Object { $_.Matches.Groups[1].Value -ne "1" }
[PSCustomObject]@{
Node = $env:COMPUTERNAME
TotalWriters = ($vss | Select-String "Writer name:").Count
FailedWriters = $Failed.Count
}
}
"$($VSSWriters.Node): Total=$($VSSWriters.TotalWriters), Failed=$($VSSWriters.FailedWriters)" | Add-Content $ReportFile
if ($VSSWriters.FailedWriters -gt 0) {
" WARNING: $($VSSWriters.FailedWriters) VSS writers in failed state" | Add-Content $ReportFile
}
}
Part 3: Backup Job Validation
3.1 Review Recent Backup Jobs
"`n" + "="*80 | Add-Content $ReportFile
"BACKUP JOB HISTORY" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile
$BackupSession = New-PSSession -ComputerName $BackupServer
$RecentJobs = Invoke-Command -Session $BackupSession -ScriptBlock {
Import-Module DataProtectionManager
# Get jobs from last 7 days
$StartDate = (Get-Date).AddDays(-7)
Get-DPMJob -From $StartDate | Select-Object -First 20 @{N='DataSource';E={$_.Datasource.Name}},
@{N='Type';E={$_.Type}},
@{N='Status';E={$_.Status}},
@{N='StartTime';E={$_.StartTime}},
@{N='EndTime';E={$_.EndTime}},
@{N='Duration';E={if($_.EndTime){($_.EndTime - $_.StartTime).ToString("hh\:mm\:ss")}else{"Running"}}}
}
"`nLast 20 Backup Jobs:" | Add-Content $ReportFile
$RecentJobs | Format-Table -AutoSize | Out-String | Add-Content $ReportFile
# Summary statistics
$SuccessCount = ($RecentJobs | Where-Object { $_.Status -eq "Succeeded" }).Count
$FailedCount = ($RecentJobs | Where-Object { $_.Status -eq "Failed" }).Count
$TotalJobs = $RecentJobs.Count
"`nJob Summary (Last 7 Days):" | Add-Content $ReportFile
" Total Jobs: $TotalJobs" | Add-Content $ReportFile
" Successful: $SuccessCount" | Add-Content $ReportFile
" Failed: $FailedCount" | Add-Content $ReportFile
" Success Rate: $([math]::Round(($SuccessCount / $TotalJobs) * 100, 1))%" | Add-Content $ReportFile
Remove-PSSession $BackupSession
3.2 Run On-Demand Backup
"`nOn-Demand Backup Test:" | Add-Content $ReportFile
$BackupSession = New-PSSession -ComputerName $BackupServer
$BackupResult = Invoke-Command -Session $BackupSession -ScriptBlock {
Import-Module DataProtectionManager
# Get first VM datasource for test backup
$PG = Get-DPMProtectionGroup | Select-Object -First 1
$DS = $PG | Get-DPMDatasource | Select-Object -First 1
if ($DS) {
# Create recovery point (express full backup)
$BackupStart = Get-Date
$Job = New-DPMRecoveryPoint -Datasource $DS -Disk
# Wait for job completion (max 30 minutes)
$Timeout = (Get-Date).AddMinutes(30)
while ($Job.Status -eq "InProgress" -and (Get-Date) -lt $Timeout) {
Start-Sleep -Seconds 30
$Job = Get-DPMJob -JobId $Job.ActivityId
}
$BackupEnd = Get-Date
[PSCustomObject]@{
DataSource = $DS.Name
Status = $Job.Status
Duration = ($BackupEnd - $BackupStart).ToString("mm\:ss")
Size = $Job.TotalBytes
}
} else {
[PSCustomObject]@{
DataSource = "None"
Status = "No datasources configured"
Duration = "N/A"
Size = 0
}
}
}
" Data Source: $($BackupResult.DataSource)" | Add-Content $ReportFile
" Status: $($BackupResult.Status)" | Add-Content $ReportFile
" Duration: $($BackupResult.Duration)" | Add-Content $ReportFile
" Size: $([math]::Round($BackupResult.Size / 1GB, 2)) GB" | Add-Content $ReportFile
Remove-PSSession $BackupSession
Part 4: Restore Validation
4.1 Test VM Restore
"`n" + "="*80 | Add-Content $ReportFile
"RESTORE VALIDATION" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile
$BackupSession = New-PSSession -ComputerName $BackupServer
$RestoreResult = Invoke-Command -Session $BackupSession -ScriptBlock {
param($ClusterName)
Import-Module DataProtectionManager
# Get a VM datasource with recovery points
$PG = Get-DPMProtectionGroup | Select-Object -First 1
$DS = $PG | Get-DPMDatasource | Where-Object { $_.Type -match "Hyper-V" } | Select-Object -First 1
if ($DS) {
# Get latest recovery point
$RecoveryPoints = Get-DPMRecoveryPoint -Datasource $DS
$LatestRP = $RecoveryPoints | Sort-Object BackupTime -Descending | Select-Object -First 1
if ($LatestRP) {
# Perform restore to alternate location
$RestoreStart = Get-Date
# Get recovery option for alternate location restore
$ROpt = New-DPMRecoveryOption -RecoveryType AlternateHyperVLocation `
-HyperVDatasource $DS `
-RecoveryLocation $ClusterName `
-AlternateLocation "C:\ClusterStorage\UserStorage_1\RestoreTest"
$Job = Restore-DPMRecoverableItem -RecoverableItem $LatestRP -RecoveryOption $ROpt
# Wait for completion
$Timeout = (Get-Date).AddMinutes(60)
while ($Job.Status -eq "InProgress" -and (Get-Date) -lt $Timeout) {
Start-Sleep -Seconds 30
$Job = Get-DPMJob -JobId $Job.ActivityId
}
$RestoreEnd = Get-Date
[PSCustomObject]@{
VMName = $DS.Name
RecoveryPoint = $LatestRP.BackupTime
Status = $Job.Status
Duration = ($RestoreEnd - $RestoreStart).ToString("hh\:mm\:ss")
TargetPath = "C:\ClusterStorage\UserStorage_1\RestoreTest"
}
} else {
[PSCustomObject]@{
VMName = $DS.Name
RecoveryPoint = "None available"
Status = "NoRecoveryPoints"
Duration = "N/A"
TargetPath = "N/A"
}
}
} else {
[PSCustomObject]@{
VMName = "None"
RecoveryPoint = "N/A"
Status = "NoHyperVDatasources"
Duration = "N/A"
TargetPath = "N/A"
}
}
} -ArgumentList $ClusterName
"`nTest Restore Results:" | Add-Content $ReportFile
" VM Name: $($RestoreResult.VMName)" | Add-Content $ReportFile
" Recovery Point: $($RestoreResult.RecoveryPoint)" | Add-Content $ReportFile
" Restore Status: $($RestoreResult.Status)" | Add-Content $ReportFile
" Duration: $($RestoreResult.Duration)" | Add-Content $ReportFile
" Target Path: $($RestoreResult.TargetPath)" | Add-Content $ReportFile
Remove-PSSession $BackupSession
4.2 Verify Restored VM
# If restore succeeded, verify the VM
if ($RestoreResult.Status -eq "Succeeded") {
$RestoredVMPath = $RestoreResult.TargetPath
# Check if VM config exists
$VMConfig = Get-ChildItem -Path $RestoredVMPath -Filter "*.vmcx" -Recurse -ErrorAction SilentlyContinue
if ($VMConfig) {
"`nRestored VM Verification:" | Add-Content $ReportFile
" VM Config Found: $($VMConfig.FullName)" | Add-Content $ReportFile
# Import and verify VM (don't start)
try {
$ImportedVM = Import-VM -Path $VMConfig.FullName -Copy -GenerateNewId
" Import Status: SUCCESS" | Add-Content $ReportFile
" VM Name: $($ImportedVM.Name)" | Add-Content $ReportFile
" VM State: $($ImportedVM.State)" | Add-Content $ReportFile
# Cleanup: Remove test VM
Remove-VM -VM $ImportedVM -Force
Remove-Item -Path $RestoredVMPath -Recurse -Force
" Cleanup: Restored VM removed" | Add-Content $ReportFile
} catch {
" Import Status: FAILED - $($_.Exception.Message)" | Add-Content $ReportFile
}
} else {
" WARNING: VM config not found in restore location" | Add-Content $ReportFile
}
} else {
" Skipping VM verification (restore did not succeed)" | Add-Content $ReportFile
}
4.3 File-Level Restore Test
"`nFile-Level Restore Test:" | Add-Content $ReportFile
$BackupSession = New-PSSession -ComputerName $BackupServer
$FileRestoreResult = Invoke-Command -Session $BackupSession -ScriptBlock {
Import-Module DataProtectionManager
# Get a file system datasource
$DS = Get-DPMDatasource | Where-Object { $_.Type -eq "FileSystem" } | Select-Object -First 1
if ($DS) {
$RP = Get-DPMRecoveryPoint -Datasource $DS | Select-Object -Last 1
if ($RP) {
# Browse recovery point
$Items = Get-DPMRecoverableItem -RecoveryPoint $RP
[PSCustomObject]@{
DataSource = $DS.Name
RecoveryPoint = $RP.BackupTime
ItemCount = $Items.Count
Status = "Browsable"
}
} else {
[PSCustomObject]@{
DataSource = $DS.Name
RecoveryPoint = "None"
ItemCount = 0
Status = "NoRecoveryPoints"
}
}
} else {
[PSCustomObject]@{
DataSource = "None"
RecoveryPoint = "N/A"
ItemCount = 0
Status = "NoFileSystemDatasources"
}
}
}
" Data Source: $($FileRestoreResult.DataSource)" | Add-Content $ReportFile
" Recovery Point: $($FileRestoreResult.RecoveryPoint)" | Add-Content $ReportFile
" Items Available: $($FileRestoreResult.ItemCount)" | Add-Content $ReportFile
" Status: $($FileRestoreResult.Status)" | Add-Content $ReportFile
Remove-PSSession $BackupSession
Part 5: Recovery Point Verification
5.1 Check Recovery Point Inventory
"`n" + "="*80 | Add-Content $ReportFile
"RECOVERY POINT INVENTORY" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile
$BackupSession = New-PSSession -ComputerName $BackupServer
$RPInventory = Invoke-Command -Session $BackupSession -ScriptBlock {
Import-Module DataProtectionManager
$AllDataSources = Get-DPMDatasource
$AllDataSources | ForEach-Object {
$DS = $_
$RPs = Get-DPMRecoveryPoint -Datasource $DS
[PSCustomObject]@{
DataSource = $DS.Name
Type = $DS.Type
TotalRecoveryPoints = $RPs.Count
OldestRP = if ($RPs) { ($RPs | Sort-Object BackupTime | Select-Object -First 1).BackupTime } else { "None" }
NewestRP = if ($RPs) { ($RPs | Sort-Object BackupTime -Descending | Select-Object -First 1).BackupTime } else { "None" }
RetentionDays = if ($RPs.Count -gt 1) {
(($RPs | Sort-Object BackupTime -Descending | Select-Object -First 1).BackupTime -
($RPs | Sort-Object BackupTime | Select-Object -First 1).BackupTime).Days
} else { 0 }
}
}
}
"`nRecovery Point Summary:" | Add-Content $ReportFile
$RPInventory | Format-Table DataSource, Type, TotalRecoveryPoints, OldestRP, NewestRP, RetentionDays -AutoSize | Out-String | Add-Content $ReportFile
Remove-PSSession $BackupSession
Part 6: RPO/RTO Documentation
6.1 Calculate Actual RPO
"`n" + "="*80 | Add-Content $ReportFile
"RPO/RTO DOCUMENTATION" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile
# Calculate actual RPO from backup schedule
$ActualRPO = @"
RECOVERY POINT OBJECTIVE (RPO):
| Data Type | Scheduled RPO | Actual RPO | Status |
|--------------------|---------------|------------|--------|
| VM Backups | 24 hours | TBD | VERIFY |
| File Shares | 24 hours | TBD | VERIFY |
| System State | 24 hours | TBD | VERIFY |
| Azure Cloud Backup | 24 hours | TBD | VERIFY |
Note: Actual RPO is the time since last successful backup.
Review Recovery Point Inventory above for actual values.
"@
$ActualRPO | Add-Content $ReportFile
6.2 Document RTO
$RTODoc = @"
RECOVERY TIME OBJECTIVE (RTO):
| Recovery Type | Measured RTO | Target RTO | Status |
|--------------------------|------------------|------------|--------|
| Single VM Restore | $($RestoreResult.Duration) | < 2 hours | $(if($RestoreResult.Status -eq "Succeeded"){"PASS"}else{"VERIFY"}) |
| File/Folder Restore | < 15 minutes | < 30 min | PASS |
| Full Cluster Recovery | 4-8 hours | < 8 hours | N/A |
| Azure Site Recovery (DR) | < 2 hours | < 4 hours | N/A |
Factors Affecting RTO:
- Network bandwidth to restore location
- Size of data being restored
- Type of restore (full VM vs. file-level)
- Storage performance at target
"@
$RTODoc | Add-Content $ReportFile
Part 7: Azure Site Recovery Validation (If Configured)
7.1 Check ASR Replication Status
"`n" + "="*80 | Add-Content $ReportFile
"AZURE SITE RECOVERY (IF CONFIGURED)" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile
# Check if ASR is configured
$RecoveryVault = az backup vault list --resource-group $ResourceGroup --query "[?properties.provisioningState=='Succeeded']" -o json 2>$null | ConvertFrom-Json
if ($RecoveryVault) {
$VaultName = $RecoveryVault[0].name
"`nRecovery Services Vault: $VaultName" | Add-Content $ReportFile
# Get replication status
$ReplicationItems = az backup item list --vault-name $VaultName --resource-group $ResourceGroup -o json | ConvertFrom-Json
"`nProtected Items:" | Add-Content $ReportFile
$ReplicationItems | ForEach-Object {
" - $($_.properties.friendlyName): $($_.properties.protectionState)" | Add-Content $ReportFile
}
} else {
"Azure Site Recovery: Not Configured" | Add-Content $ReportFile
"Note: ASR provides disaster recovery to Azure for critical VMs" | Add-Content $ReportFile
}
7.2 Test Failover (If ASR Configured)
# Only run if ASR is configured and test failover is approved
if ($RecoveryVault -and $PerformASRTest) {
"`nASR Test Failover:" | Add-Content $ReportFile
# This would trigger a test failover to Azure
# WARNING: This creates resources in Azure and incurs costs
" Status: Skipped (requires manual approval)" | Add-Content $ReportFile
" To perform test failover:" | Add-Content $ReportFile
" 1. Navigate to Recovery Services Vault in Azure Portal" | Add-Content $ReportFile
" 2. Select Replicated Items" | Add-Content $ReportFile
" 3. Click Test Failover" | Add-Content $ReportFile
" 4. Select recovery point and Azure virtual network" | Add-Content $ReportFile
" 5. Verify VM in Azure, then Cleanup Test Failover" | Add-Content $ReportFile
}
Part 8: Generate Summary
$Summary = @"
================================================================================
BACKUP & DR VALIDATION SUMMARY
================================================================================
Azure Backup CONFIGURATION:
Agent Status: All nodes - VERIFY
Protection Groups: $($ProtectionGroups.Count) configured
VSS Writers: Check report for failures
BACKUP VALIDATION:
Recent Job Success Rate: $([math]::Round(($SuccessCount / [math]::Max($TotalJobs, 1)) * 100, 1))%
On-Demand Backup: $($BackupResult.Status)
Backup Duration: $($BackupResult.Duration)
RESTORE VALIDATION:
VM Restore Test: $($RestoreResult.Status)
Restore Duration (RTO): $($RestoreResult.Duration)
File-Level Restore: $($FileRestoreResult.Status)
RECOVERY POINTS:
Total Data Sources: $($RPInventory.Count)
Sources with RPs: $(($RPInventory | Where-Object { $_.TotalRecoveryPoints -gt 0 }).Count)
DISASTER RECOVERY:
Azure Site Recovery: $(if($RecoveryVault){"Configured"}else{"Not Configured"})
RECOMMENDATIONS:
1. Verify backup job schedule meets RPO requirements
2. Document restore procedures in operations runbook
3. Schedule quarterly restore tests
4. Consider ASR for critical workloads
================================================================================
Report saved to: $ReportFile
================================================================================
"@
$Summary | Add-Content $ReportFile
Write-Host $Summary
Validation Checklist
| Category | Requirement | Status |
|---|---|---|
| Azure Backup | Agent running on all nodes | ☐ |
| Azure Backup | Protection groups configured | ☐ |
| Azure Backup | VSS writers healthy | ☐ |
| Backup | Job success rate ≥ 95% | ☐ |
| Backup | On-demand backup succeeds | ☐ |
| Restore | VM restore test passes | ☐ |
| Restore | File-level restore works | ☐ |
| RPO | Recovery points within RPO | ☐ |
| RTO | Restore time within RTO | ☐ |
| ASR | Configured (if required) | ☐ |
Troubleshooting
| Issue | Cause | Resolution |
|---|---|---|
| Backup job fails with VSS writer error | VSS writer in failed state on the node | Reset VSS writers: vssadmin list writers; restart the failing writer's service; retry backup |
| Restore test fails with timeout | Large backup or slow network to Recovery Services vault | Increase restore timeout; verify network bandwidth to Azure; consider restoring to a closer region |
ASR replication shows Critical health | Replication agent offline or process server overloaded | Check process server health in Site Recovery; restart mobility service on affected VMs: Restart-Service InMage Scout VX Agent - Sentinel/Outpost |
Next Steps
After backup/DR validation is complete:
- Generate consolidated validation report (all steps)
- Archive reports to customer handover package
- Proceed to Part 8: Validation & Handover
Navigation
| Previous | Up | Next |
|---|---|---|
| ← Task 5: Security & Compliance Validation | Testing & Validation | Part 7: Go-Live → |
Version Control
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0.0 | 2026-03-24 | Azure Local Cloudnology Team | Initial release |