Code Engine app fails to connect to ERP batch processing endpoint due to DNS resolution errors

We’re running ERP batch jobs on IBM Cloud Code Engine that need to connect to internal services in our VPC. The jobs fail intermittently with connection timeouts. We configured a VPC connector for the Code Engine application, but DNS resolution seems inconsistent - sometimes the internal hostname resolves, sometimes it doesn’t. Our ERP batch processing endpoint is at erp-batch.internal.vpc and the Code Engine app logs show:


Error: getaddrinfo ENOTFOUND erp-batch.internal.vpc
at GetAddrInfoReqWrap.onlookup [as oncomplete]
Connection attempt failed after 30s timeout

The VPC connector is attached to the same subnet as our ERP services. Is there a specific DNS resolver configuration needed for internal service discovery in Code Engine? The batch jobs run every 4 hours and fail about 40% of the time, causing significant processing delays.

Here’s the comprehensive solution addressing all three critical aspects:

1. VPC Connector Configuration First, verify your VPC connector is properly configured for internal service discovery. The connector must be in the same VPC as your ERP services and attached to a subnet with routing to your target service subnet. Use the IBM Cloud CLI to check:


ibmcloud ce project select --name your-project
ibmcloud ce application get --name your-app

Look for the VPC connector details and confirm the subnet ID matches your expected configuration.

2. DNS Resolver Configuration The core issue is DNS resolver precedence. Code Engine applications need explicit DNS configuration when using private DNS zones. Here’s what you need to do:

  • Ensure your IBM Cloud DNS private zone (internal.vpc) is enabled for the VPC containing your Code Engine connector
  • Verify the DNS resolver location is set to the same region/zone as your Code Engine application
  • In your private DNS zone settings, confirm the permitted networks include your VPC
  • Add a custom resolver configuration in your VPC by navigating to VPC > DNS Resolvers and creating a DNS resolver with type “Manual” pointing to your internal DNS servers

3. Internal Service Discovery Implementation For reliable service discovery, implement these patterns:


// In your Code Engine application configuration
export DNS_RESOLVER="10.240.0.4"
export INTERNAL_DOMAIN="internal.vpc"
export ERP_ENDPOINT="erp-batch.${INTERNAL_DOMAIN}"

Alternatively, use IP addresses directly in your application configuration as environment variables until DNS is fully stable, then transition to hostnames.

Additional Troubleshooting Steps:

  1. Test DNS resolution from within a Code Engine job by deploying a debug container:

    • Create a simple job that runs `nslookup erp-batch.internal.vpc
    • Check the job logs to see which DNS server responds
  2. Verify network ACLs and security groups allow DNS traffic (UDP/TCP port 53) from your Code Engine subnet to your VPC’s DNS resolver

  3. Check if your ERP service subnet has proper routing back to the Code Engine connector subnet

  4. Consider using IBM Cloud Service Endpoints if your ERP service supports it - this bypasses VPC networking entirely

Why This Happens: Code Engine containers start with default DNS configuration pointing to IBM Cloud’s public DNS resolvers. When a VPC connector is attached, DNS resolution should automatically use the VPC’s DNS configuration, but there’s a timing window during container startup where DNS queries might go to public resolvers first. Private DNS zones must be explicitly permitted for the VPC to ensure proper resolution.

The 40% failure rate suggests DNS queries are timing out before falling back to the correct resolver. Setting up the VPC DNS resolver properly eliminates this race condition. After implementing these changes, monitor your batch jobs for 24-48 hours. The connection timeouts should disappear once DNS resolution is consistent.

The VPC connector itself doesn’t have direct DNS configuration options. What you need is to ensure your VPC’s DNS resolver is properly configured at the VPC level, and Code Engine will inherit those settings through the connector. However, there’s a catch - if you’re using the default IBM Cloud DNS service, it takes precedence. You may need to set up DNS forwarding rules or use a private DNS zone. Can you share how your VPC DNS is currently configured? Also, what’s the subnet configuration for your VPC connector?

DNS propagation shouldn’t cause 40% failure rate - that sounds like a routing or resolution order issue. Private DNS zones in IBM Cloud should resolve instantly once configured. I suspect the problem is that Code Engine might be trying to resolve through public DNS first. Check your private DNS zone configuration - make sure it’s bound to the VPC where your Code Engine connector resides. Also verify the DNS resolver locations are set correctly in your VPC settings.

I’ve seen similar DNS issues with Code Engine VPC connectors. The problem is usually that Code Engine uses IBM Cloud DNS by default, which doesn’t know about your VPC’s internal DNS zones. You need to configure custom DNS resolvers in your VPC connector settings. Check if your VPC has a custom resolver configured for the .internal.vpc domain. Also verify that the DNS resolver IP addresses are accessible from the subnet your VPC connector uses.