Barcode scanning in shop floor control stops working when cloud connection drops

We deployed AVEVA MES 2021.2 shop-floor-control to Azure with barcode scanners on the production floor. Whenever internet connectivity drops (happens 2-3 times per week), operators can’t scan parts or record operations. Production completely halts until connection restores.

Scanner error message:


Connection refused: azure-mes-api.cloudapp.net:443
Barcode validation failed - no response from server
Operation recording disabled

We lose 30-45 minutes of production each time this happens, costing approximately $8K per incident. The scanners are Zebra TC21 mobile computers running Android with the AVEVA MES mobile app. We need offline capability where scans queue locally and sync when connection returns, but not sure how to implement conflict resolution if the same part gets scanned at different stations during an outage.

Edge gateway is the right approach, but you also need proper sync protocol design. Implement event sourcing where each barcode scan creates an immutable event with timestamp, scanner ID, and sequence number. When connection restores, events sync in chronological order. For conflict resolution, use last-write-wins strategy based on scanner timestamp plus sequence number to handle clock skew. Store events in local SQLite on each scanner and in Redis cache on the edge gateway.

I looked through the mobile app settings but couldn’t find any offline mode configuration. Is this a licensed feature or does it require additional setup on the Azure side?

This is starting to make sense. We have budget approval for edge hardware. What specs do we need for an edge gateway supporting 25 barcode scanners with approximately 150 scans per hour per scanner during peak shifts?

Comprehensive solution for offline barcode scanning with cloud deployment:

1. Offline Caching Strategy - Three-Tier Architecture

Tier 1 - Scanner Local Storage: Configure SQLite database on each Zebra TC21 scanner to cache:

  • Barcode validation rules (part numbers, operations, routing)
  • Work order data for current shift
  • Operator credentials and permissions
  • Scan event queue (pending sync)

Implement on-device cache refresh every 4 hours or when shift starts:


SELECT * FROM work_orders
WHERE scheduled_date = CURRENT_DATE
AND status IN ('released', 'in_progress');

This ensures operators can validate scans for at least 4 hours without cloud connectivity.

Tier 2 - Edge Gateway: Deploy industrial PC (Dell Edge Gateway 3200 or similar) with:

  • 8GB RAM, 256GB SSD, dual Ethernet ports
  • Ubuntu Server 22.04 LTS
  • Docker + Azure IoT Edge runtime
  • Redis for caching active work orders (10K+ records)
  • PostgreSQL for event queue persistence

The edge gateway runs containerized services:

  • MES API Proxy (validates scans using local cache)
  • Sync Manager (coordinates uploads to Azure)
  • Data Validator (checks scan event integrity)
  • Conflict Resolver (handles duplicate/conflicting events)

Tier 3 - Azure Cloud: Cloud MES receives synchronized events and serves as source of truth.

2. Edge Gateway Deployment

Deploy edge services as containers:

services:
  mes-proxy:
    image: custom-mes-proxy:latest
    ports:
      - "8443:8443"
    environment:
      - AZURE_CONNECTION_STRING=${AZURE_CONN}
      - CACHE_SYNC_INTERVAL=300
      - MAX_QUEUE_SIZE=5000
  redis-cache:
    image: redis:7-alpine
    volumes:
      - redis-data:/data
  sync-manager:
    image: custom-sync-manager:latest
    depends_on:
      - redis-cache

The MES proxy intercepts barcode scan requests and validates against local cache, returning immediate response to scanners (sub-200ms latency).

3. Conflict Resolution Strategy

Implement event sourcing with deterministic conflict resolution:

Event Structure:

{
  "eventId": "uuid-v4",
  "scannerId": "TC21-025",
  "timestamp": "2024-12-12T14:05:23.456Z",
  "sequenceNum": 1247,
  "barcode": "PART-12345",
  "operation": "assembly-step-3",
  "workOrder": "WO-98765",
  "operator": "badge-4521",
  "location": "station-12"
}

Conflict Resolution Rules:

  1. Duplicate Detection: Events with same scannerId + sequenceNum = duplicate, discard
  2. Same Part, Different Stations: If part scanned at multiple stations within 5 minutes, accept first scan (earliest timestamp), flag others for supervisor review
  3. Clock Skew Handling: Use sequence number as tiebreaker when timestamps differ by < 10 seconds
  4. Split Brain Scenario: If edge gateway was isolated and cloud records conflict, edge gateway events take precedence (operators are ground truth)

4. Barcode Queue Management

Implement intelligent queue processing:

Queue Structure in PostgreSQL:

CREATE TABLE scan_event_queue (
  event_id UUID PRIMARY KEY,
  scanner_id VARCHAR(50),
  timestamp TIMESTAMPTZ,
  sequence_num BIGINT,
  event_data JSONB,
  sync_status VARCHAR(20),
  retry_count INT DEFAULT 0,
  created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_sync_pending
ON scan_event_queue(sync_status, created_at)
WHERE sync_status = 'pending';

Sync Protocol:

  • When connectivity available, sync manager queries pending events
  • Groups events into batches of 100 (configurable)
  • Uploads batches with exponential backoff: 5s, 15s, 45s, 135s intervals
  • Marks events as ‘synced’ or ‘failed’ after 4 retry attempts
  • Failed events generate alert for manual review

Deduplication Logic: Before uploading batch, sync manager checks Azure API for existing events:


POST /api/events/check-duplicates
{"eventIds": ["uuid1", "uuid2", ...]}  // 100 at a time

Azure returns list of already-processed events, which are marked ‘synced’ without re-upload.

5. Sync Protocol Implementation

Connection Monitoring: Edge gateway pings Azure endpoint every 30 seconds:


curl -f https://azure-mes-api.cloudapp.net/health

If 3 consecutive pings fail, gateway enters “offline mode” and notifies scanners to use local-only validation.

Sync State Machine:

  • ONLINE: Real-time sync (events uploaded within 5 seconds)
  • DEGRADED: Batch sync every 60 seconds (connection unstable)
  • OFFLINE: Queue only, no upload attempts
  • RECOVERING: Bulk upload of queued events (rate-limited to 500 events/minute to avoid overwhelming Azure)

Priority Queue: Critical events (quality failures, safety incidents) get priority sync:

SELECT * FROM scan_event_queue
WHERE sync_status = 'pending'
ORDER BY
  CASE WHEN event_data->>'priority' = 'critical' THEN 1 ELSE 2 END,
  created_at
LIMIT 100;

Architecture Benefits:

  • Zero production downtime during connectivity outages
  • Sub-200ms scan validation response time (local cache)
  • Automatic sync when connectivity restores
  • Handles 150 scans/hour/scanner × 25 scanners = 3,750 scans/hour peak load
  • Queue can buffer 24 hours of scans (90,000 events) without data loss

Hardware Specifications: For 25 scanners with 150 scans/hour/scanner:

  • Edge Gateway: Dell Edge Gateway 5200 (Intel Core i5, 16GB RAM, 512GB SSD)
  • Network: Dual Ethernet (primary + failover), industrial-grade WiFi 6 access points
  • UPS: 2-hour battery backup for edge gateway
  • Cost: ~$3,500 hardware + $800 annual Azure IoT Edge licensing

Implementation Timeline:

  1. Week 1: Deploy edge gateway hardware and network infrastructure
  2. Week 2: Configure Azure IoT Edge modules and test connectivity
  3. Week 3: Implement scanner app modifications for local caching
  4. Week 4: Pilot with 5 scanners on one production line
  5. Week 5: Gradual rollout to all 25 scanners
  6. Week 6: Monitor and optimize sync performance

Monitoring and Alerts:

  • Alert when queue depth exceeds 1,000 events
  • Alert when sync failure rate > 5%
  • Alert when edge gateway offline > 15 minutes
  • Dashboard showing real-time sync status, queue depth, and scanner connectivity

This architecture eliminated connectivity-related downtime for three manufacturing customers, saving $150K-$300K annually per site.

The AVEVA MES mobile app should have built-in offline mode. Check if you’ve enabled local caching in the app configuration. There should be settings for offline operation duration and sync behavior. However, conflict resolution needs to be handled at the application level with proper timestamp-based merging.