OPC UA connections from shop floor control module timing out after Azure migration

Shop floor control module in ft-10.0 loses OPC UA connections to PLCs every 10-15 minutes after migrating to Azure cloud. The OpcUaClient shows connection timeout errors and we’re experiencing production data loss. Connection timeout is currently 20 seconds, and network latency from Azure to on-premise PLCs averages 80ms. We deployed an OPC UA gateway on-premise but sessions still drop. Configuration snippet:


opcua.sessionTimeout=20000
opcua.requestTimeout=15000
opcua.maxReconnectDelay=30000

Is this a gateway deployment issue or do we need session pooling? The data loss is affecting real-time production tracking.

Here’s the complete solution addressing all four focus areas:

1. OPC UA Gateway Deployment: Configure your Kepware gateway for cloud connectivity:


# Kepware OPC UA Server Settings
gateway.opcua.sessionTimeout=120000
gateway.opcua.maxSessionCount=100
gateway.opcua.enableSessionReuse=true
gateway.opcua.subscriptionAggregation=true

Deploy the gateway with redundancy - use two Kepware instances in failover configuration. The gateway should maintain persistent sessions to PLCs and handle session multiplexing for cloud clients.

2. Connection Timeout Tuning: Update OpcUaClient configuration in ft-10.0:


opcua.sessionTimeout=120000
opcua.requestTimeout=45000
opcua.maxReconnectDelay=60000
opcua.keepAliveInterval=30000
opcua.publishingInterval=1000

The 120-second session timeout accommodates network latency plus Azure VPN idle timeout. The 45-second request timeout handles slow responses during network congestion.

3. Network Latency Optimization: Configure Azure VPN Gateway for industrial protocols:


vpnGateway.idleTimeout=300000
vpnGateway.tcpKeepAlive=60000
vpnGateway.enableBGP=true
vpnGateway.sku=VpnGw2

Upgrade to VpnGw2 SKU for better throughput and lower latency. Enable BGP for faster failover. Set TCP keepalive to 60 seconds to prevent connection drops.

Optimize routing with Azure Route Tables:


# Force all OPC UA traffic through optimized path
Route: OPC-UA-Traffic
Address prefix: [On-premise-PLC-subnet]
Next hop: VPN Gateway
Metric: 100

Consider implementing Azure ExpressRoute if budget allows - it will reduce your 80ms latency to 20-30ms and provide more stable connections.

4. Session Pooling: Implement session pooling in shop floor control module:


opcua.sessionPool.enabled=true
opcua.sessionPool.minSessions=5
opcua.sessionPool.maxSessions=20
opcua.sessionPool.sessionIdleTimeout=300000
opcua.sessionPool.validateOnBorrow=true

Configure subscription keepalive to maintain active sessions:


opcua.subscription.keepAlive.enabled=true
opcua.subscription.keepAlive.interval=10000
opcua.subscription.keepAlive.maxNotifications=3

This creates a heartbeat subscription that sends keepalive messages every 10 seconds, preventing session timeouts during idle periods.

Data Optimization Strategies:

Implement tag grouping by update frequency:


# Fast tags (process variables): 500ms
# Medium tags (status): 2000ms
# Slow tags (configuration): 30000ms

Enable data buffering on the gateway:


gateway.buffer.enabled=true
gateway.buffer.maxSize=10000
gateway.buffer.flushInterval=5000

This buffers data during temporary disconnections, preventing data loss.

Monitoring Configuration:

Set up monitoring for OPC UA health:


monitor.opcua.sessionState=true
monitor.opcua.subscriptionHealth=true
monitor.opcua.dataQuality=true
alert.opcua.sessionTimeout=true
alert.opcua.dataLoss=true

Configure Azure Network Watcher to monitor VPN connection quality and set alerts for latency spikes above 100ms.

Implementation Steps:

  1. Update Azure VPN Gateway settings and verify TCP keepalive
  2. Reconfigure Kepware gateway with session management
  3. Update ft-10.0 OpcUaClient timeouts and enable session pooling
  4. Implement subscription keepalive with appropriate intervals
  5. Test with gradual load increase - start with 10 tags, then scale to full production
  6. Monitor for 48 hours and tune publishing intervals based on actual data patterns

After implementing these changes, your OPC UA connections should remain stable with zero data loss. The combination of proper timeout values, session pooling, and network optimization eliminates the dropout pattern you’re experiencing.

For your Kepware gateway, make sure you’ve enabled the OPC UA Advanced plugin and configured session management properly. The gateway should be doing subscription aggregation and session keepalive on behalf of the cloud clients. This reduces the impact of network latency on session stability.

Good points. We’re using Azure VPN Gateway with default settings. I found the idle timeout is set to 5 minutes which could explain the dropouts. For the OPC UA gateway, we’re using Kepware but haven’t configured any session pooling. Should we be maintaining persistent sessions even when no data is actively being read?