Compute instance database connections reset after OS patching: troubleshooting persistent disconnects

After applying OS patches to our compute instances running Oracle Database, we’re experiencing connection resets causing transaction disruptions. Users see ORA-03113 and ORA-03114 errors randomly throughout the day.

Listener log shows:


TNS-12537: TNS:connection closed
TNS-12560: TNS:protocol adapter error

The database itself is stable with no crashes. Connections work fine for 10-15 minutes then suddenly drop. This started immediately after OS patching. We’re running Oracle Linux 7.9 with Database 19c. Has anyone seen OS TCP keepalive configuration issues after patching that affect database connections?

Your issue is a classic post-patching TCP keepalive misconfiguration combined with connection pool tuning needs. Here’s the complete solution addressing all three focus areas:

OS TCP Keepalive Configuration: OS patches reset /etc/sysctl.conf to defaults. You need to reconfigure and persist these settings:

net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_keepalive_intvl = 60
net.ipv4.tcp_keepalive_probes = 9

Apply with ‘sysctl -p’. These values send keepalive probes every 5 minutes (300s) which is well within your firewall’s 600s timeout. The 9 probes with 60s intervals provide 540 seconds of retry time before declaring connection dead.

Database Listener Diagnostics: Enable listener tracing to confirm where connections are failing. Check listener.ora for SQLNET.EXPIRE_TIME parameter - set it to 10 minutes to have Oracle send its own keepalive packets independent of OS settings. This provides defense-in-depth. Review listener logs with timestamps correlated to ORA-03113 errors to determine if termination is client-side, network-side, or server-side.

Connection Pool Tuning: Your UCP configuration needs three critical changes:

pds.setValidateConnectionOnBorrow(true);
pds.setConnectionValidationTimeout(5);
pds.setInactiveConnectionTimeout(300);

ValidateConnectionOnBorrow ensures dead connections are detected before being handed to application. Set validation timeout to 5 seconds for quick detection. InactiveConnectionTimeout of 300 seconds recycles idle connections before firewall timeout.

Additionally, configure SQLNET.RECV_TIMEOUT and SQLNET.SEND_TIMEOUT in sqlnet.ora on the database server to 300 seconds. This ensures the database server itself detects dead clients.

Root Cause: Your ORA-03113 errors occur because the OS patch reset TCP keepalive to 7200 seconds (2 hours), but your firewall drops connections at 600 seconds (10 minutes). Idle pooled connections are terminated by the firewall, but neither the OS nor the application detects this until attempting to use the connection, resulting in protocol adapter errors.

Verification: After implementing these changes, monitor for 48 hours. You should see no more ORA-03113/ORA-03114 errors. Use ‘netstat -anp | grep ESTABLISHED’ to verify connections remain stable. The combination of OS keepalive (300s), Oracle SQLNET.EXPIRE_TIME (600s), and UCP validation ensures connections are either kept alive or properly recycled before becoming stale.

This is almost certainly TCP keepalive related. OS patches often reset network kernel parameters to defaults. Check your current settings with ‘sysctl net.ipv4.tcp_keepalive_time net.ipv4.tcp_keepalive_intvl net.ipv4.tcp_keepalive_probes’. Default values are often too aggressive for database connections.

Good point. We do have a firewall with 600 second idle timeout. Could this be causing the issue even if the database connection is actively used?

We’re using UCP (Universal Connection Pool) on the Java application side. The pool has min 10 and max 50 connections with no validation query configured currently.

Yes, if your application uses connection pooling with idle connections, they can be terminated by the firewall. You need to tune both OS TCP keepalive and application connection pool settings. What connection pool are you using on the application side?

You’re right - the values were reset to defaults. tcp_keepalive_time is 7200 seconds. Should I increase this for database connections? What values do you recommend?