Device registry sync fails after Greengrass edge reboot with new devices missing

After rebooting our Greengrass v2 edge nodes for maintenance, newly added devices don’t appear in the device registry. Devices were registered in AWS IoT Core before the reboot, but the Greengrass local registry isn’t syncing with the cloud registry.

Greengrass logs show registry component status:


Registry sync attempt failed
Component: aws.greengrass.clientdevices.Registry
Error: Unable to fetch device list from cloud

Network connectivity appears normal - other Greengrass components sync fine. IAM permission for registry operations should be included in our standard Greengrass role. The devices show as registered in AWS IoT Console but aren’t accessible for remote management through Greengrass. This affects our ability to manage edge devices remotely. Has anyone seen registry sync issues specific to the client devices component after reboots?

Checked the TES role and it has iot:DescribeThing but I don’t see iot:ListThingPrincipals. Could that single missing permission cause the entire registry sync to fail? The error message says “unable to fetch device list” which sounds like a permissions issue.

Here’s the comprehensive solution addressing all three focus areas:

Registry Component Status: First, verify the client devices registry component is running and check its version:

sudo /greengrass/v2/bin/greengrass-cli component list
# Look for aws.greengrass.clientdevices.Registry

Check component logs for detailed errors:

sudo tail -f /greengrass/v2/logs/aws.greengrass.clientdevices.Registry.log

Verify component configuration:

sudo cat /greengrass/v2/config/effectiveConfig.yaml | grep -A 20 clientdevices.Registry

Ensure the component recipe includes cloud sync:

ComponentConfiguration:
  DefaultConfiguration:
    syncCloudSettings: true
    cloudSyncInterval: 300

Network Connectivity: Test connectivity to all required IoT endpoints:

# IoT Core endpoint
curl -I https://iot.us-east-1.amazonaws.com

# IoT Data endpoint
curl -I https://data.iot.us-east-1.amazonaws.com

# Credentials endpoint
curl -I https://credentials.iot.us-east-1.amazonaws.com

If using VPC endpoints, verify endpoint policies allow the registry component’s operations:

{
  "Statement": [{
    "Effect": "Allow",
    "Principal": "*",
    "Action": [
      "iot:DescribeThing",
      "iot:ListThingPrincipals",
      "iot:DescribeCertificate"
    ],
    "Resource": "*"
  }]
}

Check DNS resolution for IoT endpoints:

nslookup iot.us-east-1.amazonaws.com
nslookup data.iot.us-east-1.amazonaws.com

IAM Permission for Registry: Update your Token Exchange Service role with required permissions:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": [
      "iot:DescribeThing",
      "iot:ListThingPrincipals",
      "iot:DescribeCertificate",
      "iot:GetThingShadow",
      "iot:UpdateThingShadow"
    ],
    "Resource": "*"
  }]
}

Apply the updated policy:

aws iam put-role-policy \
  --role-name GreengrassTESRole \
  --policy-name IoTRegistryAccess \
  --policy-document file://registry-policy.json

Complete Resolution Steps:

  1. Add missing IAM permissions to TES role (especially iot:ListThingPrincipals)

  2. Clear registry component cache:

    sudo systemctl stop greengrass
    sudo rm -rf /greengrass/v2/work/aws.greengrass.clientdevices.Registry
    sudo systemctl start greengrass
    
    

3. Force immediate registry sync:
   ```bash
   sudo /greengrass/v2/bin/greengrass-cli component restart \
     --names aws.greengrass.clientdevices.Registry
   
  1. Monitor sync progress in logs:
    sudo tail -f /greengrass/v2/logs/aws.greengrass.clientdevices.Registry.log | grep -i sync
    
    

5. Verify devices appear in local registry:
   ```bash
   # Check registry database
   sudo sqlite3 /greengrass/v2/work/aws.greengrass.clientdevices.Registry/registry.db \
     "SELECT * FROM devices;"
   
  1. Test device connectivity through Greengrass:
    # From client device, attempt connection
    mosquitto_pub -h greengrass-core-ip -p 8883 \
      --cert device.crt --key device.key --cafile root-ca.pem \
      -t test/topic -m "test message"
    
    

**Key Insights:**
- The registry component requires specific IAM permissions beyond basic Greengrass operations
- Network connectivity must include both control plane (iot.region) and data plane (data.iot.region) endpoints
- Registry cache can become corrupted during reboots, requiring manual cleanup
- Credential refresh requires Greengrass restart after IAM policy updates

After applying these fixes, registry sync should complete within 5 minutes and newly registered devices will appear in the Greengrass local registry for remote management.

The client devices registry component has specific IAM permissions beyond the standard Greengrass role. Check if your Token Exchange Service role includes iot:DescribeThing and iot:ListThingPrincipals. Without these, the registry component can’t fetch device details from IoT Core even though other components work fine.

Another consideration - check if your registry component configuration has the correct cloud sync settings. The component has a syncCloudSettings parameter that controls whether it fetches from IoT Core. If this got changed or corrupted during the reboot, it would explain why sync fails. Look at the component recipe and verify syncCloudSettings is true.

Yes, that missing permission would definitely cause sync failures. The registry component needs to list all principals (certificates) attached to each Thing to properly sync the registry. Add iot:ListThingPrincipals and also iot:DescribeCertificate to the role. After updating the role, you need to restart the Greengrass core - it doesn’t pick up new IAM permissions automatically due to credential caching.

I’ve seen this before where the registry component’s local cache gets corrupted during unexpected reboots. Even with correct permissions and network access, the component might need its cache cleared. Try removing /greengrass/v2/work/aws.greengrass.clientdevices.Registry and restarting Greengrass. This forces a full resync from IoT Core.