Here’s the complete solution addressing all three areas:
RAM Policy Permissions:
Your current policy only grants snapshot creation and viewing, not restoration. For full DR testing capability, update the policy to include all necessary restore actions:
{
"Statement": [
{
"Effect": "Allow",
"Action": [
"ecs:CreateSnapshot",
"ecs:DescribeSnapshots",
"ecs:RunInstances",
"ecs:CreateDisk",
"ecs:AttachDisk",
"ecs:DescribeInstances",
"ecs:DescribeDisks",
"ecs:DescribeInstanceTypes",
"ecs:DescribeImages"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"vpc:DescribeVpcs",
"vpc:DescribeVSwitches"
],
"Resource": "*"
}
],
"Version": "1"
}
Key permissions explained:
ecs:RunInstances: Creates new ECS instances (required for restore)
ecs:CreateDisk: Creates disks from snapshots
ecs:AttachDisk: Attaches restored disks to instances
- Describe actions: Required for validation and instance configuration during restore
- VPC actions: Needed if restoring into VPC networks (typical for production DR)
Snapshot Restore Action:
The specific action for snapshot restore depends on your workflow:
-
Full Instance Restore (recommended for DR):
Use ecs:RunInstances with snapshot parameter. This creates a new instance directly from snapshot in one operation.
-
Disk-level Restore:
ecs:CreateDisk with snapshotId parameter
- Then
ecs:AttachDisk to attach to existing or new instance
- More granular but requires multiple steps
For DR testing, ecs:RunInstances is the primary action you need. It handles creating the instance with disks restored from snapshots automatically.
Additional Required Actions:
- If using security groups:
ecs:DescribeSecurityGroups, `ecs:AuthorizeSecurityGroup
- If using EIP:
ecs:AllocateEipAddress, `ecs:AssociateEipAddress
- If tagging restored instances: `ecs:TagResources
- If in resource groups:
resourcemanager:ListResourceGroups **Resource Scope in Policy:** Your current “Resource”: “*”` is overly permissive. Apply least privilege by scoping to DR resources:
{
"Statement": [
{
"Effect": "Allow",
"Action": [
"ecs:RunInstances",
"ecs:CreateDisk",
"ecs:AttachDisk"
],
"Resource": [
"acs:ecs:cn-shanghai:*:instance/*",
"acs:ecs:cn-shanghai:*:disk/*",
"acs:ecs:cn-shanghai:*:snapshot/dr-*"
],
"Condition": {
"StringEquals": {
"ecs:ResourceGroup": "rg-dr-testing"
}
}
},
{
"Effect": "Allow",
"Action": [
"ecs:DescribeSnapshots",
"ecs:DescribeInstances",
"ecs:DescribeDisks"
],
"Resource": "*"
}
],
"Version": "1"
}
This scopes restore actions to:
- Specific region (cn-shanghai - adjust to your DR region)
- Snapshots with ‘dr-’ prefix (naming convention for DR snapshots)
- Resources in ‘rg-dr-testing’ resource group
- Read-only describe actions remain unrestricted for convenience
Best Practices for DR RAM Policies:
-
Separate Policies by Environment:
- Production restore: Highly restricted, requires approval workflow
- DR testing: More permissive, scoped to test resource groups
- Create separate RAM roles for each
-
Time-based Access:
Add condition to limit restore permissions to DR testing windows:
"Condition": {
"DateGreaterThan": {"acs:CurrentTime": "2025-01-26T00:00:00Z"},
"DateLessThan": {"acs:CurrentTime": "2025-01-27T23:59:59Z"}
}
-
Audit Trail:
Enable ActionTrail to log all snapshot restore operations for compliance.
-
MFA Requirement:
For production restores, add MFA condition:
"Condition": {
"Bool": {"acs:MFAPresent": "true"}
}
Validation Steps:
-
Update RAM policy with required actions
-
Wait 2-3 minutes for policy propagation
-
Test restore using RAM user/role:
aliyun ecs RunInstances --ImageId img-xxx \
--SnapshotId s-dr-xxx \
--InstanceType ecs.g6.large \
--SecurityGroupId sg-xxx
-
Verify instance creates successfully from snapshot
-
Check ActionTrail logs to confirm proper authorization
Troubleshooting:
- If still getting ‘Unauthorized’: Check for explicit deny policies in parent accounts or SCPs
- Verify RAM role trust policy allows your DR team to assume the role
- Confirm snapshots exist in the same region as restore target
- Check snapshot status is ‘accomplished’ (completed snapshots only)
The core issue is that snapshot creation and restoration are separate permission domains in RAM. Your policy granted read/create snapshot permissions but not the execute permissions needed for restore operations. Adding ecs:RunInstances and related actions, properly scoped to DR resources, will enable your team to perform disaster recovery validation while maintaining security boundaries.