Network segmentation for ERP microservices: Security Groups vs NACLs for compliance isolation

We’re architecting network segmentation for an ERP microservices deployment on AWS and need guidance on Security Groups versus NACLs for compliance isolation. Our architecture has 15+ microservices across different compliance zones (PCI for payment processing, PHI for employee health data, general financial data).

Security Groups seem more manageable with stateful rules and instance-level control, but compliance auditors are asking about subnet-level isolation which NACLs provide. Some architects argue for layered security using both, but that adds significant operational complexity.

Looking for real-world experiences with ERP segmentation strategies. How do you balance granularity, auditability, and operational overhead? Do auditors accept Security Groups alone, or do they require the additional subnet isolation that NACLs provide for true defense-in-depth?

Having designed network segmentation for multiple ERP deployments under various compliance frameworks, here’s a comprehensive approach that balances granularity, auditability, and layered security:

Segmentation Strategy:

Use a layered approach where subnet design aligns with compliance boundaries, not microservice boundaries:

  1. Subnet-Level Isolation (NACLs):

    • Create separate subnets for each compliance zone:

      • PCI subnet for payment processing services
      • PHI subnet for health data services
      • Financial subnet for general ERP data
      • Management subnet for ops/monitoring tools
    • NACLs provide the compliance-visible network boundary. Auditors can easily verify that PCI data cannot flow to non-PCI subnets at the network layer.

    • Keep NACL rules simple and broad:

      • Allow traffic between subnets that need to communicate (e.g., Financial → PCI for payment initiation)
      • Deny all other inter-subnet traffic by default
      • Allow ephemeral ports (32768-65535) for return traffic
      • Use explicit deny rules for known-bad sources
  2. Instance-Level Control (Security Groups):

    • Security Groups handle the detailed, stateful access control:

      • Define groups by service function (payment-processor-sg, invoice-service-sg, etc.)
      • Use source security groups in rules (allow payment-processor-sg to access database-sg on port 5432)
      • This provides granular, manageable control without maintaining IP lists
    • Security Groups are where your operational agility lives. You can modify these frequently as services evolve without touching subnet-level NACLs.

Granularity Considerations:

Don’t over-segment at the subnet level. Each compliance zone gets its own subnet(s), but multiple microservices within the same compliance tier can share a subnet. Use Security Groups to control inter-service communication within the subnet.

Example architecture for your 15 microservices:

  • PCI Subnet: 3 payment-related microservices (controlled by separate Security Groups)
  • PHI Subnet: 2 health data services
  • Financial Subnet: 8 general ERP services
  • Management Subnet: 2 ops/monitoring services

This gives you 4 sets of NACLs to manage instead of 15, while maintaining fine-grained control via Security Groups.

Auditability Benefits:

Auditors appreciate the layered approach because:

  1. NACLs provide visible, subnet-level network segmentation that maps to compliance requirements
  2. Security Groups provide detailed access logs and change tracking
  3. The combination demonstrates defense-in-depth

For PCI DSS specifically, requirement 1.2.1 calls for restricting inbound/outbound traffic to that which is necessary. Both Security Groups and NACLs contribute to this requirement, but at different layers.

Operational Overhead Management:

The key to manageable layered security:

  1. Standardize NACL templates for each compliance zone - rarely changed
  2. Automate Security Group management via Infrastructure as Code (Terraform, CloudFormation)
  3. Use Security Group referencing instead of IP addresses - reduces brittleness
  4. Implement tagging strategy to track which resources belong to which compliance zone
  5. Centralize logging - VPC Flow Logs capture both NACL and Security Group decisions

Practical Implementation:

Start with Security Groups only, then add NACLs as a compliance layer. This prevents over-engineering early while ensuring you can add subnet isolation when auditors require it. Many organizations run successfully on Security Groups alone for months before compliance audits drive NACL implementation.

The reality: Auditors will accept Security Groups alone if you can demonstrate effective isolation and have comprehensive logging. However, the layered approach with NACLs makes audits smoother because it provides the “network segmentation” evidence they’re looking for in a format they understand (subnet boundaries).

For your ERP microservices, I recommend the layered approach from the start. The operational overhead is manageable if you keep NACLs simple and aligned with compliance zones, not microservice boundaries. This architecture scales better as you add services and satisfies both operational needs (Security Groups) and compliance requirements (NACLs).

The ephemeral port issue is real but manageable. We use NACLs primarily for broad deny rules (block entire CIDR ranges, deny specific ports) and let Security Groups handle the detailed allow rules. This minimizes NACL complexity while still providing subnet-level segmentation for auditors. Document your approach clearly - auditors care more about demonstrable isolation than the specific mechanism.

We implemented both for a healthcare ERP system with similar compliance needs. Security Groups handle 90% of the access control - they’re stateful, easier to manage, and provide fine-grained control. NACLs act as a secondary boundary at the subnet level, mainly for deny rules and compliance documentation. The operational overhead isn’t as bad as you’d think if you keep NACL rules simple and use Security Groups for detailed policies.