Your issue stems from three interconnected problems that need coordinated fixes:
1. Device ID Deduplication Logic:
The core issue is that your MES resource management isn’t checking for existing resources before creation. Implement proper deduplication:
// Pseudocode - Resource registration with deduplication:
1. Receive device registration/connection event from IoT gateway
2. Query resource management: SELECT * FROM resources WHERE deviceId = incoming.deviceId
3. IF resource exists: Update connection status and last_seen timestamp
4. ELSE: Create new resource entry with deviceId and metadata
5. Log registration action with timestamp for audit trail
// See: Resource Management API Documentation Section 5.1
This prevents duplicate creation by always checking existence first. The key is making this check atomic to avoid race conditions during concurrent reconnections.
2. Event Processing Rules Configuration:
You need to distinguish between different device lifecycle events. Update your IoT gateway configuration to send proper event types:
<eventProcessing>
<rule eventType="device.register" action="createResource" deduplication="deviceId"/>
<rule eventType="device.connect" action="updateConnectionStatus" deduplication="none"/>
<rule eventType="device.disconnect" action="markOffline" gracePeriod="300"/>
<rule eventType="device.heartbeat" action="updateLastSeen" deduplication="none"/>
</eventProcessing>
The critical change is having separate event types. Registration creates resources (with deduplication check), connection events just update status, and heartbeats maintain the last-seen timestamp without triggering any resource operations.
3. Heartbeat vs Registration Event Distinction:
Modify your device firmware or IoT gateway to properly categorize events:
-
Registration Event: Sent ONLY during initial device provisioning or after factory reset. Includes full device metadata (model, capabilities, location).
-
Connection Event: Sent when device connects to network after being offline. Includes deviceId and connection timestamp only.
-
Heartbeat Event: Sent periodically (every 30-60 seconds) while connected. Minimal payload with deviceId and sequence number.
-
Disconnection Event: Sent by gateway (not device) when connection lost. Triggers grace period timer.
Update your device registration payload to include event type:
{
"deviceId": "Machine-A-001",
"eventType": "connection",
"timestamp": 1704617823,
"sequenceNumber": 12847
}
Implementation Strategy:
Phase 1 (Immediate - Fix MES Side):
- Add deduplication check to resource creation logic
- Update event processing rules to handle different event types
- Configure 5-minute grace period for disconnections
Phase 2 (Coordinate with IoT Team):
- Update IoT gateway to send proper event types
- Modify device firmware if needed to support lifecycle events
- Test with pilot devices before rolling out to all equipment
Phase 3 (Cleanup):
- Write script to identify and merge existing duplicate resources
- Set up monitoring for duplicate detection
- Create alerts for abnormal registration patterns
Cleanup Existing Duplicates:
Before implementing the fix, clean up existing duplicates. Query resources for patterns like “deviceId-dup1” and merge them back to original entries. Reassign any work orders or schedules from duplicate resources to the original resource entry.
Testing:
Simulate network interruptions in test environment:
- Register test device normally
- Force disconnect for 60 seconds
- Allow reconnection
- Verify no duplicate resource created
- Confirm connection status updated correctly
- Test with multiple rapid disconnect/reconnect cycles
This comprehensive approach addresses the root cause while providing immediate mitigation and long-term prevention.