You need to fix all three permission layers systematically:
Glue Crawler IAM Role: The AWSGlueServiceRole managed policy isn’t sufficient for encrypted buckets. Create a custom policy attached to your crawler role:
{
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:ListBucket", "s3:GetBucketLocation"],
"Resource": [
"arn:aws:s3:::new-analytics-bucket",
"arn:aws:s3:::new-analytics-bucket/*"
]
},
{
"Effect": "Allow",
"Action": ["kms:Decrypt", "kms:DescribeKey", "kms:GenerateDataKey"],
"Resource": "arn:aws:kms:region:account:key/YOUR-KEY-ID"
}
The kms:GenerateDataKey permission is often overlooked but necessary for Glue to write metadata.
S3 Bucket Policy: Your bucket policy must explicitly allow the Glue crawler role, not just the service principal. Add this statement:
{
"Sid": "AllowGlueCrawler",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::ACCOUNT:role/AWSGlueServiceRole-CrawlerName"
},
"Action": ["s3:GetObject", "s3:ListBucket"],
"Resource": [
"arn:aws:s3:::new-analytics-bucket",
"arn:aws:s3:::new-analytics-bucket/*"
]
}
If you have Deny statements in the bucket policy, ensure they don’t conflict. A common issue is having a Deny for non-SSL requests that accidentally blocks the Glue service.
KMS Key Policy: This is the most commonly missed piece. Your KMS key policy must grant the Glue role permission to use the key. Add this statement to the key policy:
{
"Sid": "Allow Glue to use the key",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::ACCOUNT:role/AWSGlueServiceRole-CrawlerName"
},
"Action": ["kms:Decrypt", "kms:DescribeKey", "kms:GenerateDataKey"],
"Resource": "*"
}
Without this, even if the IAM role has kms:Decrypt permissions, the key policy will deny access.
Verification Steps:
- Test S3 access: Use AWS CLI with the crawler role credentials to list bucket contents: `aws s3 ls s3://new-analytics-bucket/data/ --profile crawler-role
- Test KMS access: Try to decrypt a sample file using the role: `aws s3 cp s3://new-analytics-bucket/data/sample.parquet - --profile crawler-role
- Check CloudTrail: Look for AccessDenied events from the Glue service to see which exact permission is failing
- Enable Glue crawler CloudWatch logs: In the crawler settings, enable logging to see detailed error messages
Additional Considerations:
- If your Parquet files were written by Spark or other tools, ensure they’re using the correct KMS key for encryption
- Verify the crawler’s exclude patterns aren’t too broad -
_temporary/** should be fine, but double-check
- For large datasets, increase the crawler’s DPU allocation to avoid timeouts (default is 2 DPUs, try 5-10 for better performance)
- If you have nested partitions (year/month/day structure), ensure the crawler is configured to detect partition keys automatically
After making these changes, test the crawler on a small subset first (use a more specific include path like s3://new-analytics-bucket/data/year=2025/month=01/) before running it on the entire dataset.