Let me provide a comprehensive breakdown of our automated IAM policy enforcement implementation, including architecture, code patterns, and lessons learned.
Implementation Architecture: Our solution consists of three main components working together:
-
IAM Policy Automation via Schematics: We maintain a centralized governance repository containing Terraform configurations for all IAM policies. Schematics workspaces are configured for each IBM Cloud account, and policies are deployed through automated pipelines. The key innovation was creating conditional IAM authorization policies that evaluate resource tags before allowing operations.
-
Schematics Deployment Pipeline: Our governance repository is structured with separate directories for each policy type (tagging, access-control, resource-quotas). When changes are committed to the main branch, a GitLab CI/CD pipeline automatically triggers Schematics workspace updates. This ensures policy changes are reviewed via pull requests before deployment and maintains full audit trails of who changed what and when.
-
Activity Tracker Alerts and Remediation: We configured Activity Tracker to route all resource lifecycle events (create, update, delete) to Event Notifications. A Cloud Functions action subscribes to these notifications, evaluates resource tags against our compliance rules, and either auto-remediates (adds missing tags if it can determine appropriate values) or creates ServiceNow tickets for manual review.
The complete flow: Developer creates a resource → IAM policy evaluates tags → If compliant, resource is created → Activity Tracker logs event → Event Notifications filters for compliance checks → Cloud Functions validates tags → If issues found, remediation workflow triggers.
IAM Policy Automation Details: The Terraform configuration enforces tagging at multiple levels. We created custom IAM policies that deny resource creation operations unless required tags are present. The challenge was that IBM Cloud IAM doesn’t natively support tag-based conditions in authorization policies, so we implemented this through a combination of resource group policies and automated compliance checking.
Our approach: Rather than blocking at IAM level (which isn’t fully supported for tags), we enforce through deployment pipelines and catch violations immediately after creation via Activity Tracker. Resources without required tags are automatically quarantined (moved to a non-compliant resource group with restricted access) until properly tagged.
Schematics Best Practices: We learned several critical lessons about managing IAM policies as code:
- Version all Terraform modules and pin to specific versions in production workspaces
- Use Schematics workspace variables to parameterize policies across environments (dev/staging/prod)
- Implement drift detection by running Terraform plan daily and alerting on any manual changes to IAM policies
- Maintain separate workspaces for each account to prevent cross-account policy contamination
- Use Terraform remote state in Cloud Object Storage to enable policy auditing and rollback
Activity Tracker Alert Configuration: The key to avoiding alert fatigue was intelligent filtering. Our Event Notifications routing rules are highly specific:
- Filter 1: Only resource.create and resource.update events
- Filter 2: Only from production accounts (exclude sandbox/dev accounts)
- Filter 3: Only resource types that require tags (exclude IAM policies themselves, network ACLs, etc.)
- Filter 4: Check if tags exist and match required pattern
This reduced our daily alert volume from ~5000 events to ~50 actionable compliance violations. We also implemented alert aggregation - if the same user creates multiple non-compliant resources within an hour, we send one consolidated notification instead of individual alerts.
Automated Remediation Workflow: Our Cloud Functions remediation logic follows this decision tree:
- Can we determine appropriate tag values automatically? (Yes: apply tags, log action, send confirmation)
- Is the resource owner identifiable? (Yes: send notification with 24-hour deadline, No: escalate to platform team)
- After 24 hours, are tags still missing? (Yes: quarantine resource and create incident ticket)
- After 7 days in quarantine, is resource still non-compliant? (Yes: schedule for deletion with final warning)
This graduated enforcement approach balances compliance requirements with developer experience. We don’t immediately break things, but we make it increasingly painful to remain non-compliant.
Results and Metrics: After six months of operation:
- Tag compliance improved from 70% to 98% (2200+ resources properly tagged)
- Audit preparation time reduced from 3 weeks to 2 days (all evidence in Activity Tracker logs)
- Cost allocation accuracy improved from ~60% to 95% (finance can now track spending by department)
- Developer complaints decreased after initial training period (automated tagging in pipelines made compliance transparent)
- Zero security incidents related to improperly tagged resources (previously had 2-3 per quarter)
Key Success Factors:
- Executive sponsorship - we had CISO backing to enforce policies even when it caused friction
- Developer enablement - we built tagging into pipeline templates so compliance was automatic for most use cases
- Clear documentation - we published comprehensive guides on tagging requirements and automation
- Gradual rollout - we piloted with one development team, refined the approach, then expanded organization-wide
- Continuous improvement - we review compliance metrics monthly and adjust policies based on feedback
The combination of IAM policy automation through Schematics and real-time enforcement via Activity Tracker alerts created a robust governance framework that scales with our cloud adoption while maintaining compliance. The key insight was that prevention (pipeline automation) plus detection (Activity Tracker) plus remediation (Cloud Functions) together create a comprehensive solution that no single component could achieve alone.