Db2 encryption key rotation fails during automated backup, causing incomplete backups

Our automated Db2 backup jobs started failing after we implemented quarterly encryption key rotation using Key Protect. The backups run nightly at 2 AM, but when Key Protect rotates the root key (scheduled for the first of each quarter), the backup job fails with ‘access denied’ errors.

The Db2 instance is configured with Key Protect integration for encryption at rest. Here’s what we’re seeing in the backup logs:


[2024-12-01 02:15:33] BACKUP DATABASE started
[2024-12-01 02:15:41] ERROR: SQL1035N Database cannot access encryption key
[2024-12-01 02:15:41] SQLSTATE: 57019
[2024-12-01 02:15:41] BACKUP terminated with errors

The Key Protect rotation happens at midnight on rotation day. Our backup automation uses a service ID with Key Protect Reader role. Between midnight and when we manually trigger a backup the next morning (around 8 AM), the backups fail consistently. After 8 AM, backups work fine again.

Is there a propagation delay for rotated keys in Db2 backup automation? We need these nightly backups to complete successfully even during key rotation periods.

That explains the timing issue. Is there a way to detect when key rotation is in progress so we can delay the backup job? Or should we just add a static delay on the first of each quarter? I’d prefer a more elegant solution than hardcoding dates.

I verified the service-to-service authorization is in place with Reader role. The suggestion about checking lastRotateDate makes sense. How do I implement that check in our backup script? We’re using a simple bash script that calls ‘db2 backup database’ right now.

Here’s a comprehensive solution that addresses all three aspects: Key Protect integration timing, Db2 backup automation, and encryption key rotation handling.

Root Cause Analysis:

The issue occurs because Db2’s encryption layer caches key material for performance. When Key Protect rotates a root key, Db2 needs to:

  1. Detect the rotation event
  2. Fetch the new key version from Key Protect
  3. Re-wrap all data encryption keys (DEKs) with the new root key
  4. Update internal key metadata

This process isn’t instantaneous and varies based on database size and activity. Backup operations during this window fail because the encryption context is in a transitional state.

Automated Solution with Key Rotation Detection:

Modify your backup automation script to check Key Protect status first:

#!/bin/bash
# Enhanced backup script with key rotation detection

KEY_ID="your-key-protect-key-id"
LAST_BACKUP_FILE="/var/db2/backup/.last_successful_backup"
ROTATION_WAIT_HOURS=4

# Get key metadata from Key Protect
KEY_METADATA=$(ibmcloud kp key show $KEY_ID --output json)
LAST_ROTATE=$(echo $KEY_METADATA | jq -r '.lastRotateDate')

# Check if rotation occurred since last backup
if [ -f "$LAST_BACKUP_FILE" ]; then
  LAST_BACKUP=$(cat $LAST_BACKUP_FILE)
  if [[ "$LAST_ROTATE" > "$LAST_BACKUP" ]]; then
    ROTATE_AGE_HOURS=$(( ($(date +%s) - $(date -d "$LAST_ROTATE" +%s)) / 3600 ))
    if [ $ROTATE_AGE_HOURS -lt $ROTATION_WAIT_HOURS ]; then
      echo "Key rotation detected $ROTATE_AGE_HOURS hours ago. Waiting..."
      exit 0
    fi
  fi
fi

# Proceed with backup
db2 backup database PRODDB online to /backup/db2
if [ $? -eq 0 ]; then
  date -Iseconds > $LAST_BACKUP_FILE
fi

IAM Authorization Verification:

Ensure proper service-to-service authorization exists:

# Check existing authorizations
ibmcloud iam authorization-policies | grep -A 5 "dashdb-for-transactions"

# If missing, create authorization (requires admin role)
ibmcloud iam authorization-policy-create dashdb-for-transactions kms Reader \
  --source-service-instance-id <db2-instance-id> \
  --target-service-instance-id <key-protect-instance-id>

Service ID Permissions:

Your service ID needs:

  1. Key Protect Reader role on the specific key or instance
  2. Db2 Operator role (minimum) to execute backups
  3. Cloud Object Storage Writer role if backing up to COS

Verify with:

ibmcloud iam service-policies <service-id>

Backup Job Scheduling Recommendations:

  1. Dynamic Scheduling: Use the script above in your cron job - it will self-skip during rotation windows
  2. Retry Logic: Add retry attempts with exponential backoff:
    for attempt in 1 2 3; do
      db2 backup database PRODDB && break
      sleep $(( attempt * 1800 ))  # 30min, 1hr, 1.5hr
    done
    
    

3. **Rotation Window Avoidance**: If possible, schedule key rotations during low-activity periods (weekends) and offset backup times by 6+ hours

**Key Rotation Best Practices:**

- **Notification Setup**: Configure Key Protect to send Event Notifications when keys are rotated. Subscribe your backup automation to these events.
- **Gradual Rollout**: For large Db2 instances (>1TB), the re-encryption process can take 6-8 hours. Plan accordingly.
- **Monitoring**: Add Sysdig metrics to track backup success rates around rotation dates. Alert on failures exceeding 2 consecutive attempts.

**Immediate Workaround:**

Until you implement the automated detection, modify your cron schedule:

```cron
# Normal backup: 2 AM daily
0 2 * * * /path/to/backup.sh

# Skip first 3 days of quarter (rotation window)
0 2 4-31 1,4,7,10 * /path/to/backup.sh
0 2 * 2,3,5,6,8,9,11,12 * /path/to/backup.sh

This solution provides robust handling of encryption key rotation while maintaining backup reliability. The key rotation detection prevents failures, and the retry logic ensures backups complete even if timing is slightly off.

This is a known timing issue with Key Protect integration during active key rotation. When a root key is rotated, there’s a synchronization window where Db2 needs to re-establish the encryption context with the new key version. During this window (typically 2-6 hours), backup operations that require key access can fail. The Reader role on your service ID is correct, but the backup job needs to wait for the key rotation to fully propagate through all Db2 encryption layers before attempting the backup.

You can query the Key Protect API to check the key rotation status before running backups. Use the GET /api/v2/keys/{id}/metadata endpoint - it returns a ‘lastRotateDate’ timestamp. Compare this with your last successful backup timestamp. If the key was rotated after your last backup, add a delay. I’ve implemented this check in our automation scripts and it works reliably. The propagation delay varies, but I’ve found 4 hours to be safe across all regions.