Here’s the complete zero-downtime rotation procedure covering all three focus areas:
Service Account Key Rotation (Proper Sequence):
Step 1 - Create new key WITHOUT deleting the old one:
gcloud iam service-accounts keys create new-key.json --iam-account=SERVICE_ACCOUNT_EMAIL
Step 2 - Verify the new key is valid:
gcloud auth activate-service-account --key-file=new-key.json
gcloud sql instances list
Step 3 - Keep both keys active during transition period.
Application Credential Update (Kubernetes Example):
Update the Kubernetes secret:
kubectl create secret generic cloudsql-credentials --from-file=key.json=new-key.json --dry-run=client -o yaml | kubectl apply -f -
Perform rolling restart to pick up new credentials:
kubectl rollout restart deployment/your-app-deployment
kubectl rollout status deployment/your-app-deployment
Verify connectivity from pods:
kubectl exec -it POD_NAME -- curl -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/email
IAM Role Verification:
Confirm the service account has necessary roles:
gcloud projects get-iam-policy PROJECT_ID --flatten="bindings[].members" --filter="bindings.members:serviceAccount:SERVICE_ACCOUNT_EMAIL"
Required roles for Cloud SQL:
roles/cloudsql.client (minimum for Cloud SQL Proxy)
roles/cloudsql.instanceUser (if using IAM database authentication)
Verify at database level if using IAM authentication:
GRANT ALL PRIVILEGES ON DATABASE dbname TO "SERVICE_ACCOUNT_EMAIL";
Complete Zero-Downtime Rotation Process:
- Pre-rotation: Verify current connectivity and document current key ID
- Create new key: Generate new key via console or gcloud, keep old key active
- Update credentials: Update all credential stores (secrets, config files, environment variables)
- Rolling restart: Restart applications/services in controlled manner:
- For Kubernetes: `kubectl rollout restart
- For Compute Engine: Update instance metadata and restart service
- For Cloud Run: Deploy new revision with updated secret
- Verify connectivity: Test database connections from all services
- Monitor: Check application logs for any authentication errors (wait 15-30 minutes)
- Delete old key: Only after confirming new key works everywhere:
gcloud iam service-accounts keys delete KEY_ID --iam-account=SERVICE_ACCOUNT_EMAIL
Troubleshooting Your Specific Issue:
Your immediate problem is pods using deleted key. Fix it:
- Create a new service account key (the old one is gone)
- Update your Kubernetes secret with the new key
- Force pod restart: `kubectl rollout restart deployment/your-deployment
- Watch pod logs: `kubectl logs -f deployment/your-deployment
- Verify Cloud SQL Proxy logs if using proxy sidecar
Best Practices to Prevent Future Issues:
- Implement automated key rotation with tools like Berglas or External Secrets Operator
- Use Workload Identity instead of service account keys when possible (eliminates key rotation entirely)
- Set up monitoring alerts for authentication failures
- Document rotation procedures in runbooks
- Test rotation process in non-production environments first
- Never delete old keys until new keys are verified working in all environments
- Use Secret Manager for centralized credential management instead of Kubernetes secrets
The root cause was deleting the old key before ensuring all applications successfully transitioned to the new key, combined with not performing a rolling restart of Kubernetes pods to reload the updated secret. Always follow the create-update-verify-delete sequence for zero-downtime rotations.