RAM policy blocks ECS snapshot restore for disaster recovery testing

Attempting to restore ECS snapshots during our quarterly disaster recovery validation is failing with ‘Unauthorized’ errors. Our DR team has a RAM policy that should allow snapshot operations, but when they try to restore snapshots to test recovery procedures, the operation is denied.

Current RAM policy attached to DR team role:

{
  "Statement": [{
    "Effect": "Allow",
    "Action": ["ecs:CreateSnapshot", "ecs:DescribeSnapshots"],
    "Resource": "*"
  }],
  "Version": "1"
}

The error occurs when executing restore operations through the console or API. Snapshot creation and viewing work fine, but any restore attempt fails immediately. This is blocking our DR validation process required for compliance audits.

I believe the issue is related to RAM policy permissions not including the snapshot restore action, but I’m uncertain about the correct action name and whether the resource scope needs to be more specific. Any guidance on proper RAM policies for disaster recovery workflows?

Your RAM policy is missing the restore action. Creating snapshots and restoring from snapshots are separate permissions in Alibaba Cloud. You need to add the restore action explicitly.

The action you’re looking for is probably ecs:CreateDiskFromSnapshot or ecs:CreateInstanceFromSnapshot depending on whether you’re restoring individual disks or entire instances. Check the ECS API documentation for the exact action names.

Thanks, that makes sense. We’re trying to restore entire ECS instances from snapshots for DR testing. Should I add ecs:CreateInstanceFromSnapshot to the policy? Are there any other related permissions needed for a complete restore operation?

Don’t forget about the snapshot resource itself. Your current policy allows operations on all resources ("Resource": "*"), but depending on your organization’s security policies, you might need to explicitly grant access to the specific snapshots used for DR.

Also check if there are any deny policies at the account or resource group level that might be overriding your allow policy. Explicit denies always win in RAM policy evaluation.

Good point about deny policies. I’ll check with our security team if there are any account-level restrictions. For now, I need to update the RAM policy with the correct restore permissions to unblock DR testing this week.

Here’s the complete solution addressing all three areas:

RAM Policy Permissions: Your current policy only grants snapshot creation and viewing, not restoration. For full DR testing capability, update the policy to include all necessary restore actions:

{
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ecs:CreateSnapshot",
        "ecs:DescribeSnapshots",
        "ecs:RunInstances",
        "ecs:CreateDisk",
        "ecs:AttachDisk",
        "ecs:DescribeInstances",
        "ecs:DescribeDisks",
        "ecs:DescribeInstanceTypes",
        "ecs:DescribeImages"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "vpc:DescribeVpcs",
        "vpc:DescribeVSwitches"
      ],
      "Resource": "*"
    }
  ],
  "Version": "1"
}

Key permissions explained:

  • ecs:RunInstances: Creates new ECS instances (required for restore)
  • ecs:CreateDisk: Creates disks from snapshots
  • ecs:AttachDisk: Attaches restored disks to instances
  • Describe actions: Required for validation and instance configuration during restore
  • VPC actions: Needed if restoring into VPC networks (typical for production DR)

Snapshot Restore Action: The specific action for snapshot restore depends on your workflow:

  1. Full Instance Restore (recommended for DR): Use ecs:RunInstances with snapshot parameter. This creates a new instance directly from snapshot in one operation.

  2. Disk-level Restore:

    • ecs:CreateDisk with snapshotId parameter
    • Then ecs:AttachDisk to attach to existing or new instance
    • More granular but requires multiple steps

For DR testing, ecs:RunInstances is the primary action you need. It handles creating the instance with disks restored from snapshots automatically.

Additional Required Actions:

  • If using security groups: ecs:DescribeSecurityGroups, `ecs:AuthorizeSecurityGroup
  • If using EIP: ecs:AllocateEipAddress, `ecs:AssociateEipAddress
  • If tagging restored instances: `ecs:TagResources
  • If in resource groups: resourcemanager:ListResourceGroups **Resource Scope in Policy:** Your current “Resource”: “*”` is overly permissive. Apply least privilege by scoping to DR resources:
{
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ecs:RunInstances",
        "ecs:CreateDisk",
        "ecs:AttachDisk"
      ],
      "Resource": [
        "acs:ecs:cn-shanghai:*:instance/*",
        "acs:ecs:cn-shanghai:*:disk/*",
        "acs:ecs:cn-shanghai:*:snapshot/dr-*"
      ],
      "Condition": {
        "StringEquals": {
          "ecs:ResourceGroup": "rg-dr-testing"
        }
      }
    },
    {
      "Effect": "Allow",
      "Action": [
        "ecs:DescribeSnapshots",
        "ecs:DescribeInstances",
        "ecs:DescribeDisks"
      ],
      "Resource": "*"
    }
  ],
  "Version": "1"
}

This scopes restore actions to:

  • Specific region (cn-shanghai - adjust to your DR region)
  • Snapshots with ‘dr-’ prefix (naming convention for DR snapshots)
  • Resources in ‘rg-dr-testing’ resource group
  • Read-only describe actions remain unrestricted for convenience

Best Practices for DR RAM Policies:

  1. Separate Policies by Environment:

    • Production restore: Highly restricted, requires approval workflow
    • DR testing: More permissive, scoped to test resource groups
    • Create separate RAM roles for each
  2. Time-based Access: Add condition to limit restore permissions to DR testing windows:

    "Condition": {
      "DateGreaterThan": {"acs:CurrentTime": "2025-01-26T00:00:00Z"},
      "DateLessThan": {"acs:CurrentTime": "2025-01-27T23:59:59Z"}
    }
    
  3. Audit Trail: Enable ActionTrail to log all snapshot restore operations for compliance.

  4. MFA Requirement: For production restores, add MFA condition:

    "Condition": {
      "Bool": {"acs:MFAPresent": "true"}
    }
    

Validation Steps:

  1. Update RAM policy with required actions

  2. Wait 2-3 minutes for policy propagation

  3. Test restore using RAM user/role:

    
    aliyun ecs RunInstances --ImageId img-xxx \
      --SnapshotId s-dr-xxx \
      --InstanceType ecs.g6.large \
      --SecurityGroupId sg-xxx
    
  4. Verify instance creates successfully from snapshot

  5. Check ActionTrail logs to confirm proper authorization

Troubleshooting:

  • If still getting ‘Unauthorized’: Check for explicit deny policies in parent accounts or SCPs
  • Verify RAM role trust policy allows your DR team to assume the role
  • Confirm snapshots exist in the same region as restore target
  • Check snapshot status is ‘accomplished’ (completed snapshots only)

The core issue is that snapshot creation and restoration are separate permission domains in RAM. Your policy granted read/create snapshot permissions but not the execute permissions needed for restore operations. Adding ecs:RunInstances and related actions, properly scoped to DR resources, will enable your team to perform disaster recovery validation while maintaining security boundaries.

Restoring an ECS instance from snapshot actually involves multiple actions behind the scenes. You need permissions for:

  • Creating the instance (ecs:CreateInstance or ecs:RunInstances)
  • Attaching the restored disk (ecs:AttachDisk)
  • Potentially network operations if creating in a VPC
  • Describing instance types and regions

The resource scope also matters. Using "Resource": "*" works but violates least privilege. You should scope it to specific regions or resource groups for DR environments.