We’ve successfully implemented an automated firewall rule synchronization solution for our IBM Cloud VPC environments using Schematics and Terraform. The challenge was maintaining consistent security policies across dev, staging, and production while preventing configuration drift and ensuring compliance.
Our setup involves multiple VPCs with complex firewall rules that need to be synchronized while respecting environment-specific variations. The automation handles rule creation, updates, and drift detection through CI/CD pipeline integration. We also needed compliance reporting to track rule changes and ensure audit requirements are met.
The solution leverages Terraform workspaces in Schematics to manage environment-specific configurations while maintaining a single source of truth. Key components include automated rule validation, drift detection mechanisms, and integration with our GitLab CI/CD pipeline for continuous synchronization.
Happy to share implementation details and lessons learned from this project.
Great questions! Let me address both comprehensively.
Firewall Rule Automation & Drift Detection:
We run scheduled terraform plan operations every 2 hours via Schematics to detect drift. When drift is detected, the system creates a GitLab issue with details and sends Slack notifications to the ops team. For emergency manual changes, we have a grace period of 4 hours before auto-remediation kicks in, giving teams time to document and submit proper change requests. Critical production rules have stricter 1-hour detection with immediate alerts.
Our drift detection logic:
# Scheduled Schematics job checks state
terraform plan -detailed-exitcode
if [ $? -eq 2 ]; then
log_drift_event && notify_team
fi
CI/CD Pipeline Integration:
Our GitLab pipeline has three stages for firewall changes:
-
Validation Stage: Runs on every commit - terraform validate, tflint for best practices, and custom Python scripts that check rules against security baselines. This includes verifying no overly permissive rules (0.0.0.0/0) exist except for documented exceptions.
-
Plan Stage: Generates terraform plan for all affected environments. For dev/staging, this runs automatically. For production, it requires manual trigger by team lead.
-
Apply Stage: Dev/staging auto-apply after successful plan. Production requires two-stage approval: security team member + infrastructure manager. We use GitLab’s approval rules feature with required approvers from specific groups.
The pipeline integrates with Schematics API to trigger workspace operations and retrieve logs. We maintain separate Schematics workspaces per environment but share the same Git repository with branch protection rules.
Compliance Reporting Implementation:
Our compliance system tracks three key metrics:
-
Change Attribution: Every firewall modification is linked to Git commit, author, approvers, and JIRA ticket through commit message parsing.
-
Rule Lifecycle: We maintain a JSON database tracking when each rule was created, modified, and by whom. Monthly reports show rule age, modification frequency, and compliance status.
-
Policy Violations: Automated scans check for rules violating security policies (overly broad access, deprecated protocols, missing descriptions). Violations trigger immediate alerts and block deployments.
We export compliance data to IBM Cloud Object Storage for long-term retention and generate monthly PDF reports using Python with the ReportLab library. The reports include rule change summaries, drift incidents, policy violations, and remediation actions taken.
Key Lessons Learned:
- Start with read-only drift detection before enabling auto-remediation
- Document all environment-specific exceptions in code comments
- Implement gradual rollout: dev → staging → canary prod → full prod
- Use Terraform modules for reusable firewall rule patterns
- Maintain a rule naming convention that includes purpose and owner
The entire solution reduced our firewall management overhead by 70% and eliminated configuration drift incidents. Compliance audit preparation time dropped from 2 weeks to 2 hours with automated reporting.
Happy to share specific code snippets or discuss integration with other IBM Cloud services like Security and Compliance Center if needed.
How do you handle the compliance reporting aspect? We need to generate audit reports showing who changed what rules and when. Does Schematics provide built-in audit logging, or did you build custom reporting on top of it?
We use Terraform workspaces with a single codebase and separate state files per environment. The key is using variable files for environment-specific overrides. Our base configuration defines common rules, and each environment has a tfvars file for exceptions. For example, dev might allow broader SSH access while prod restricts it to bastion hosts only. We also tag all resources with environment labels for tracking.
This sounds exactly like what we need! We’re struggling with manual firewall rule updates across environments. How did you structure your Terraform code to handle environment-specific variations while keeping the core rules synchronized? Did you use separate state files for each environment?
Interested in your CI/CD integration approach. Are you running terraform plan on every commit to detect changes? How do you handle the approval workflow for production firewall changes? We need multiple approvers for prod security groups.
What about drift detection? If someone manually changes a rule in the console, how quickly does your system detect and remediate it? We’ve had issues with emergency changes bypassing automation.
Schematics provides activity tracking through IBM Cloud Activity Tracker, which logs all workspace operations. We enhanced this with custom reporting using the Schematics API to extract change history and correlate it with Git commits. Our pipeline generates weekly compliance reports in JSON format that map firewall rule changes to specific pull requests and approvers. We also implemented pre-commit hooks that validate rules against our security policies before they reach the pipeline. This caught several misconfigurations before deployment.