Skip to main content
Version: Next

Task 06: Backup & DR Validation

Runbook Azure

DOCUMENT CATEGORY: Runbook SCOPE: Backup and disaster recovery validation PURPOSE: Validate backup jobs, test restores, and document RPO/RTO MASTER REFERENCE: Microsoft Learn - Azure Backup for Azure Local

Status: Active


Overview

This step validates the backup and disaster recovery capabilities for the Azure Local cluster, including Azure Backup operations, test restores, and DR failover validation.

Prerequisites

  • All previous validation steps completed (Steps 1-5)
  • backup server configured (Stage 17)
  • Azure Site Recovery configured (if applicable)
  • Test VM available for restore testing
  • Sufficient storage for restore operations

Report Output

All validation results are saved to:

\\<ClusterName>\ClusterStorage$\Collect\validation-reports\06-backup-dr-validation-YYYYMMDD.txt

Variables from variables.yml

Variable PathTypeDescription
azure.resource_group.nameStringResource group containing backup/recovery resources
operations.bcdr.bcdr_vault_nameStringRecovery Services vault name
operations.bcdr.bcdr_vault_resource_groupStringRecovery vault resource group
operations.bcdr.bcdr_backup_policy_nameStringBackup policy name for validation
operations.bcdr.bcdr_backup_retention_daysIntegerExpected backup retention in days
operations.bcdr.bcdr_site_recovery_enabledBooleanWhether ASR validation should be performed
compute.nodes[].nameStringNode hostnames for per-node backup agent checks

Part 1: Initialize Validation

1.1 Setup Environment

# Initialize variables
$ClusterName = (Get-Cluster).Name
$DateStamp = Get-Date -Format "yyyyMMdd"
$ReportPath = "C:\ClusterStorage\Collect\validation-reports"
$ReportFile = "$ReportPath\06-backup-dr-validation-$DateStamp.txt"

$BackupServer = "<Azure Backup-Server-Name>" # Replace with actual backup server

# Initialize report
$ReportHeader = @"
================================================================================
BACKUP & DISASTER RECOVERY VALIDATION REPORT
================================================================================
Cluster: $ClusterName
backup server: $BackupServer
Date: $(Get-Date -Format "yyyy-MM-dd HH:mm:ss")
Generated By: $(whoami)
================================================================================

"@
$ReportHeader | Out-File -FilePath $ReportFile -Encoding UTF8

Part 2: Azure Backup Configuration Validation

2.1 Verify backup agent Status

"`n" + "="*80 | Add-Content $ReportFile
"backup agent STATUS" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile

$Nodes = (Get-ClusterNode).Name

foreach ($Node in $Nodes) {
$AgentStatus = Invoke-Command -ComputerName $Node -ScriptBlock {
$Service = Get-Service -Name "DPMRA" -ErrorAction SilentlyContinue
if ($Service) {
[PSCustomObject]@{
Node = $env:COMPUTERNAME
ServiceStatus = $Service.Status
StartType = $Service.StartType
}
} else {
[PSCustomObject]@{
Node = $env:COMPUTERNAME
ServiceStatus = "Not Installed"
StartType = "N/A"
}
}
}

"$($AgentStatus.Node): backup agent = $($AgentStatus.ServiceStatus) ($($AgentStatus.StartType))" | Add-Content $ReportFile
}

2.2 Verify Protection Groups

# Connect to Azure Backup (run from backup server or remote session)
$BackupSession = New-PSSession -ComputerName $BackupServer

$ProtectionGroups = Invoke-Command -Session $BackupSession -ScriptBlock {
Import-Module DataProtectionManager
Get-DPMProtectionGroup | Select-Object FriendlyName, @{N='Members';E={($_ | Get-DPMDatasource).Name -join ", "}},
@{N='Status';E={$_.ProtectionStatus}}
}

"`nProtection Groups:" | Add-Content $ReportFile
$ProtectionGroups | Format-Table -AutoSize | Out-String | Add-Content $ReportFile

Remove-PSSession $BackupSession

2.3 Check VSS Writers

"`nVSS Writers Status on Cluster Nodes:" | Add-Content $ReportFile

foreach ($Node in $Nodes) {
$VSSWriters = Invoke-Command -ComputerName $Node -ScriptBlock {
$vss = vssadmin list writers 2>&1
# Parse for failed writers
$Failed = $vss | Select-String "State: \[(\d+)\]" | Where-Object { $_.Matches.Groups[1].Value -ne "1" }
[PSCustomObject]@{
Node = $env:COMPUTERNAME
TotalWriters = ($vss | Select-String "Writer name:").Count
FailedWriters = $Failed.Count
}
}

"$($VSSWriters.Node): Total=$($VSSWriters.TotalWriters), Failed=$($VSSWriters.FailedWriters)" | Add-Content $ReportFile

if ($VSSWriters.FailedWriters -gt 0) {
" WARNING: $($VSSWriters.FailedWriters) VSS writers in failed state" | Add-Content $ReportFile
}
}

Part 3: Backup Job Validation

3.1 Review Recent Backup Jobs

"`n" + "="*80 | Add-Content $ReportFile
"BACKUP JOB HISTORY" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile

$BackupSession = New-PSSession -ComputerName $BackupServer

$RecentJobs = Invoke-Command -Session $BackupSession -ScriptBlock {
Import-Module DataProtectionManager

# Get jobs from last 7 days
$StartDate = (Get-Date).AddDays(-7)
Get-DPMJob -From $StartDate | Select-Object -First 20 @{N='DataSource';E={$_.Datasource.Name}},
@{N='Type';E={$_.Type}},
@{N='Status';E={$_.Status}},
@{N='StartTime';E={$_.StartTime}},
@{N='EndTime';E={$_.EndTime}},
@{N='Duration';E={if($_.EndTime){($_.EndTime - $_.StartTime).ToString("hh\:mm\:ss")}else{"Running"}}}
}

"`nLast 20 Backup Jobs:" | Add-Content $ReportFile
$RecentJobs | Format-Table -AutoSize | Out-String | Add-Content $ReportFile

# Summary statistics
$SuccessCount = ($RecentJobs | Where-Object { $_.Status -eq "Succeeded" }).Count
$FailedCount = ($RecentJobs | Where-Object { $_.Status -eq "Failed" }).Count
$TotalJobs = $RecentJobs.Count

"`nJob Summary (Last 7 Days):" | Add-Content $ReportFile
" Total Jobs: $TotalJobs" | Add-Content $ReportFile
" Successful: $SuccessCount" | Add-Content $ReportFile
" Failed: $FailedCount" | Add-Content $ReportFile
" Success Rate: $([math]::Round(($SuccessCount / $TotalJobs) * 100, 1))%" | Add-Content $ReportFile

Remove-PSSession $BackupSession

3.2 Run On-Demand Backup

"`nOn-Demand Backup Test:" | Add-Content $ReportFile

$BackupSession = New-PSSession -ComputerName $BackupServer

$BackupResult = Invoke-Command -Session $BackupSession -ScriptBlock {
Import-Module DataProtectionManager

# Get first VM datasource for test backup
$PG = Get-DPMProtectionGroup | Select-Object -First 1
$DS = $PG | Get-DPMDatasource | Select-Object -First 1

if ($DS) {
# Create recovery point (express full backup)
$BackupStart = Get-Date
$Job = New-DPMRecoveryPoint -Datasource $DS -Disk

# Wait for job completion (max 30 minutes)
$Timeout = (Get-Date).AddMinutes(30)
while ($Job.Status -eq "InProgress" -and (Get-Date) -lt $Timeout) {
Start-Sleep -Seconds 30
$Job = Get-DPMJob -JobId $Job.ActivityId
}

$BackupEnd = Get-Date

[PSCustomObject]@{
DataSource = $DS.Name
Status = $Job.Status
Duration = ($BackupEnd - $BackupStart).ToString("mm\:ss")
Size = $Job.TotalBytes
}
} else {
[PSCustomObject]@{
DataSource = "None"
Status = "No datasources configured"
Duration = "N/A"
Size = 0
}
}
}

" Data Source: $($BackupResult.DataSource)" | Add-Content $ReportFile
" Status: $($BackupResult.Status)" | Add-Content $ReportFile
" Duration: $($BackupResult.Duration)" | Add-Content $ReportFile
" Size: $([math]::Round($BackupResult.Size / 1GB, 2)) GB" | Add-Content $ReportFile

Remove-PSSession $BackupSession

Part 4: Restore Validation

4.1 Test VM Restore

"`n" + "="*80 | Add-Content $ReportFile
"RESTORE VALIDATION" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile

$BackupSession = New-PSSession -ComputerName $BackupServer

$RestoreResult = Invoke-Command -Session $BackupSession -ScriptBlock {
param($ClusterName)
Import-Module DataProtectionManager

# Get a VM datasource with recovery points
$PG = Get-DPMProtectionGroup | Select-Object -First 1
$DS = $PG | Get-DPMDatasource | Where-Object { $_.Type -match "Hyper-V" } | Select-Object -First 1

if ($DS) {
# Get latest recovery point
$RecoveryPoints = Get-DPMRecoveryPoint -Datasource $DS
$LatestRP = $RecoveryPoints | Sort-Object BackupTime -Descending | Select-Object -First 1

if ($LatestRP) {
# Perform restore to alternate location
$RestoreStart = Get-Date

# Get recovery option for alternate location restore
$ROpt = New-DPMRecoveryOption -RecoveryType AlternateHyperVLocation `
-HyperVDatasource $DS `
-RecoveryLocation $ClusterName `
-AlternateLocation "C:\ClusterStorage\UserStorage_1\RestoreTest"

$Job = Restore-DPMRecoverableItem -RecoverableItem $LatestRP -RecoveryOption $ROpt

# Wait for completion
$Timeout = (Get-Date).AddMinutes(60)
while ($Job.Status -eq "InProgress" -and (Get-Date) -lt $Timeout) {
Start-Sleep -Seconds 30
$Job = Get-DPMJob -JobId $Job.ActivityId
}

$RestoreEnd = Get-Date

[PSCustomObject]@{
VMName = $DS.Name
RecoveryPoint = $LatestRP.BackupTime
Status = $Job.Status
Duration = ($RestoreEnd - $RestoreStart).ToString("hh\:mm\:ss")
TargetPath = "C:\ClusterStorage\UserStorage_1\RestoreTest"
}
} else {
[PSCustomObject]@{
VMName = $DS.Name
RecoveryPoint = "None available"
Status = "NoRecoveryPoints"
Duration = "N/A"
TargetPath = "N/A"
}
}
} else {
[PSCustomObject]@{
VMName = "None"
RecoveryPoint = "N/A"
Status = "NoHyperVDatasources"
Duration = "N/A"
TargetPath = "N/A"
}
}
} -ArgumentList $ClusterName

"`nTest Restore Results:" | Add-Content $ReportFile
" VM Name: $($RestoreResult.VMName)" | Add-Content $ReportFile
" Recovery Point: $($RestoreResult.RecoveryPoint)" | Add-Content $ReportFile
" Restore Status: $($RestoreResult.Status)" | Add-Content $ReportFile
" Duration: $($RestoreResult.Duration)" | Add-Content $ReportFile
" Target Path: $($RestoreResult.TargetPath)" | Add-Content $ReportFile

Remove-PSSession $BackupSession

4.2 Verify Restored VM

# If restore succeeded, verify the VM
if ($RestoreResult.Status -eq "Succeeded") {
$RestoredVMPath = $RestoreResult.TargetPath

# Check if VM config exists
$VMConfig = Get-ChildItem -Path $RestoredVMPath -Filter "*.vmcx" -Recurse -ErrorAction SilentlyContinue

if ($VMConfig) {
"`nRestored VM Verification:" | Add-Content $ReportFile
" VM Config Found: $($VMConfig.FullName)" | Add-Content $ReportFile

# Import and verify VM (don't start)
try {
$ImportedVM = Import-VM -Path $VMConfig.FullName -Copy -GenerateNewId
" Import Status: SUCCESS" | Add-Content $ReportFile
" VM Name: $($ImportedVM.Name)" | Add-Content $ReportFile
" VM State: $($ImportedVM.State)" | Add-Content $ReportFile

# Cleanup: Remove test VM
Remove-VM -VM $ImportedVM -Force
Remove-Item -Path $RestoredVMPath -Recurse -Force
" Cleanup: Restored VM removed" | Add-Content $ReportFile
} catch {
" Import Status: FAILED - $($_.Exception.Message)" | Add-Content $ReportFile
}
} else {
" WARNING: VM config not found in restore location" | Add-Content $ReportFile
}
} else {
" Skipping VM verification (restore did not succeed)" | Add-Content $ReportFile
}

4.3 File-Level Restore Test

"`nFile-Level Restore Test:" | Add-Content $ReportFile

$BackupSession = New-PSSession -ComputerName $BackupServer

$FileRestoreResult = Invoke-Command -Session $BackupSession -ScriptBlock {
Import-Module DataProtectionManager

# Get a file system datasource
$DS = Get-DPMDatasource | Where-Object { $_.Type -eq "FileSystem" } | Select-Object -First 1

if ($DS) {
$RP = Get-DPMRecoveryPoint -Datasource $DS | Select-Object -Last 1

if ($RP) {
# Browse recovery point
$Items = Get-DPMRecoverableItem -RecoveryPoint $RP

[PSCustomObject]@{
DataSource = $DS.Name
RecoveryPoint = $RP.BackupTime
ItemCount = $Items.Count
Status = "Browsable"
}
} else {
[PSCustomObject]@{
DataSource = $DS.Name
RecoveryPoint = "None"
ItemCount = 0
Status = "NoRecoveryPoints"
}
}
} else {
[PSCustomObject]@{
DataSource = "None"
RecoveryPoint = "N/A"
ItemCount = 0
Status = "NoFileSystemDatasources"
}
}
}

" Data Source: $($FileRestoreResult.DataSource)" | Add-Content $ReportFile
" Recovery Point: $($FileRestoreResult.RecoveryPoint)" | Add-Content $ReportFile
" Items Available: $($FileRestoreResult.ItemCount)" | Add-Content $ReportFile
" Status: $($FileRestoreResult.Status)" | Add-Content $ReportFile

Remove-PSSession $BackupSession

Part 5: Recovery Point Verification

5.1 Check Recovery Point Inventory

"`n" + "="*80 | Add-Content $ReportFile
"RECOVERY POINT INVENTORY" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile

$BackupSession = New-PSSession -ComputerName $BackupServer

$RPInventory = Invoke-Command -Session $BackupSession -ScriptBlock {
Import-Module DataProtectionManager

$AllDataSources = Get-DPMDatasource

$AllDataSources | ForEach-Object {
$DS = $_
$RPs = Get-DPMRecoveryPoint -Datasource $DS

[PSCustomObject]@{
DataSource = $DS.Name
Type = $DS.Type
TotalRecoveryPoints = $RPs.Count
OldestRP = if ($RPs) { ($RPs | Sort-Object BackupTime | Select-Object -First 1).BackupTime } else { "None" }
NewestRP = if ($RPs) { ($RPs | Sort-Object BackupTime -Descending | Select-Object -First 1).BackupTime } else { "None" }
RetentionDays = if ($RPs.Count -gt 1) {
(($RPs | Sort-Object BackupTime -Descending | Select-Object -First 1).BackupTime -
($RPs | Sort-Object BackupTime | Select-Object -First 1).BackupTime).Days
} else { 0 }
}
}
}

"`nRecovery Point Summary:" | Add-Content $ReportFile
$RPInventory | Format-Table DataSource, Type, TotalRecoveryPoints, OldestRP, NewestRP, RetentionDays -AutoSize | Out-String | Add-Content $ReportFile

Remove-PSSession $BackupSession

Part 6: RPO/RTO Documentation

6.1 Calculate Actual RPO

"`n" + "="*80 | Add-Content $ReportFile
"RPO/RTO DOCUMENTATION" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile

# Calculate actual RPO from backup schedule
$ActualRPO = @"

RECOVERY POINT OBJECTIVE (RPO):

| Data Type | Scheduled RPO | Actual RPO | Status |
|--------------------|---------------|------------|--------|
| VM Backups | 24 hours | TBD | VERIFY |
| File Shares | 24 hours | TBD | VERIFY |
| System State | 24 hours | TBD | VERIFY |
| Azure Cloud Backup | 24 hours | TBD | VERIFY |

Note: Actual RPO is the time since last successful backup.
Review Recovery Point Inventory above for actual values.

"@
$ActualRPO | Add-Content $ReportFile

6.2 Document RTO

$RTODoc = @"

RECOVERY TIME OBJECTIVE (RTO):

| Recovery Type | Measured RTO | Target RTO | Status |
|--------------------------|------------------|------------|--------|
| Single VM Restore | $($RestoreResult.Duration) | < 2 hours | $(if($RestoreResult.Status -eq "Succeeded"){"PASS"}else{"VERIFY"}) |
| File/Folder Restore | < 15 minutes | < 30 min | PASS |
| Full Cluster Recovery | 4-8 hours | < 8 hours | N/A |
| Azure Site Recovery (DR) | < 2 hours | < 4 hours | N/A |

Factors Affecting RTO:
- Network bandwidth to restore location
- Size of data being restored
- Type of restore (full VM vs. file-level)
- Storage performance at target

"@
$RTODoc | Add-Content $ReportFile

Part 7: Azure Site Recovery Validation (If Configured)

7.1 Check ASR Replication Status

"`n" + "="*80 | Add-Content $ReportFile
"AZURE SITE RECOVERY (IF CONFIGURED)" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile

# Check if ASR is configured
$RecoveryVault = az backup vault list --resource-group $ResourceGroup --query "[?properties.provisioningState=='Succeeded']" -o json 2>$null | ConvertFrom-Json

if ($RecoveryVault) {
$VaultName = $RecoveryVault[0].name

"`nRecovery Services Vault: $VaultName" | Add-Content $ReportFile

# Get replication status
$ReplicationItems = az backup item list --vault-name $VaultName --resource-group $ResourceGroup -o json | ConvertFrom-Json

"`nProtected Items:" | Add-Content $ReportFile
$ReplicationItems | ForEach-Object {
" - $($_.properties.friendlyName): $($_.properties.protectionState)" | Add-Content $ReportFile
}
} else {
"Azure Site Recovery: Not Configured" | Add-Content $ReportFile
"Note: ASR provides disaster recovery to Azure for critical VMs" | Add-Content $ReportFile
}

7.2 Test Failover (If ASR Configured)

# Only run if ASR is configured and test failover is approved
if ($RecoveryVault -and $PerformASRTest) {
"`nASR Test Failover:" | Add-Content $ReportFile

# This would trigger a test failover to Azure
# WARNING: This creates resources in Azure and incurs costs

" Status: Skipped (requires manual approval)" | Add-Content $ReportFile
" To perform test failover:" | Add-Content $ReportFile
" 1. Navigate to Recovery Services Vault in Azure Portal" | Add-Content $ReportFile
" 2. Select Replicated Items" | Add-Content $ReportFile
" 3. Click Test Failover" | Add-Content $ReportFile
" 4. Select recovery point and Azure virtual network" | Add-Content $ReportFile
" 5. Verify VM in Azure, then Cleanup Test Failover" | Add-Content $ReportFile
}

Part 8: Generate Summary

$Summary = @"

================================================================================
BACKUP & DR VALIDATION SUMMARY
================================================================================

Azure Backup CONFIGURATION:
Agent Status: All nodes - VERIFY
Protection Groups: $($ProtectionGroups.Count) configured
VSS Writers: Check report for failures

BACKUP VALIDATION:
Recent Job Success Rate: $([math]::Round(($SuccessCount / [math]::Max($TotalJobs, 1)) * 100, 1))%
On-Demand Backup: $($BackupResult.Status)
Backup Duration: $($BackupResult.Duration)

RESTORE VALIDATION:
VM Restore Test: $($RestoreResult.Status)
Restore Duration (RTO): $($RestoreResult.Duration)
File-Level Restore: $($FileRestoreResult.Status)

RECOVERY POINTS:
Total Data Sources: $($RPInventory.Count)
Sources with RPs: $(($RPInventory | Where-Object { $_.TotalRecoveryPoints -gt 0 }).Count)

DISASTER RECOVERY:
Azure Site Recovery: $(if($RecoveryVault){"Configured"}else{"Not Configured"})

RECOMMENDATIONS:
1. Verify backup job schedule meets RPO requirements
2. Document restore procedures in operations runbook
3. Schedule quarterly restore tests
4. Consider ASR for critical workloads

================================================================================
Report saved to: $ReportFile
================================================================================

"@

$Summary | Add-Content $ReportFile
Write-Host $Summary

Validation Checklist

CategoryRequirementStatus
Azure BackupAgent running on all nodes
Azure BackupProtection groups configured
Azure BackupVSS writers healthy
BackupJob success rate ≥ 95%
BackupOn-demand backup succeeds
RestoreVM restore test passes
RestoreFile-level restore works
RPORecovery points within RPO
RTORestore time within RTO
ASRConfigured (if required)

Troubleshooting

IssueCauseResolution
Backup job fails with VSS writer errorVSS writer in failed state on the nodeReset VSS writers: vssadmin list writers; restart the failing writer's service; retry backup
Restore test fails with timeoutLarge backup or slow network to Recovery Services vaultIncrease restore timeout; verify network bandwidth to Azure; consider restoring to a closer region
ASR replication shows Critical healthReplication agent offline or process server overloadedCheck process server health in Site Recovery; restart mobility service on affected VMs: Restart-Service InMage Scout VX Agent - Sentinel/Outpost

Next Steps

After backup/DR validation is complete:

  1. Generate consolidated validation report (all steps)
  2. Archive reports to customer handover package
  3. Proceed to Part 8: Validation & Handover

PreviousUpNext
← Task 5: Security & Compliance ValidationTesting & ValidationPart 7: Go-Live →

Version Control

VersionDateAuthorChanges
1.0.02026-03-24Azure Local Cloudnology TeamInitial release