Data stream billing rule fails to apply correct charges for multi-tenant setup

We’re running a multi-tenant IoT platform where each customer’s devices stream telemetry data through AWS IoT Core. Our billing engine should apply different rates based on custom attributes extracted from the message payload, but we’re seeing inconsistent charge calculations.

The IoT Rules Engine SQL is supposed to extract tenant_id and data_volume from incoming messages, then route to our billing Lambda:

SELECT tenant_id, data_volume, timestamp
FROM 'device/+/telemetry'
WHERE data_volume > 0

For some tenants, charges are calculated correctly at $0.05/MB, but others show $0.00 even though their devices are actively streaming. Our CloudWatch logs show the rule is triggering, but the billing Lambda receives null values for tenant_id intermittently. This is causing significant revenue leakage - we’ve missed approximately $12K in charges over the past month for about 30% of our tenant base. Has anyone dealt with attribute extraction issues in IoT Rules Engine affecting downstream billing calculations?

We had a similar multi-tenant billing issue last year. One thing that helped was adding a republish error action to your IoT rule. Configure it to send failed messages to a separate topic like ‘billing/errors’ so you can analyze what’s actually failing. We discovered that some messages had tenant_id as a number instead of string, which our Lambda wasn’t handling. The type coercion was failing silently.

Before you change device firmware, you can handle both formats in your IoT Rules Engine SQL using the get() function with fallback logic. Something like get(get(*, 'tenant_id'), get(metadata, 'tenant_id')) to check both locations. However, for production billing scenarios, I’d recommend creating separate rules for legacy and current message formats to avoid any ambiguity. The performance impact is minimal and you get clearer audit trails for charge allocation. Also verify your Lambda has proper error handling for malformed messages - silent failures in billing are the worst kind.

I’ve implemented several IoT billing systems and this is a common pattern. Here’s a comprehensive solution addressing all three aspects:

Custom Attribute Extraction: Modify your IoT Rules Engine SQL to handle schema variations robustly:

SELECT
  coalesce(tenant_id, metadata.tenant_id, 'UNKNOWN') as tenant_id,
  cast(data_volume as Decimal) as data_volume,
  timestamp,
  topic(2) as device_id
FROM 'device/+/telemetry'
WHERE data_volume IS NOT NULL

The coalesce() function checks multiple paths and provides a fallback. Using ‘UNKNOWN’ lets you identify and quarantine problematic messages.

Multi-tenant Charge Allocation: In your billing Lambda, implement a two-phase validation:

if tenant_id == 'UNKNOWN':
    publish_to_dlq(message)
    return

tenant_config = get_tenant_billing_config(tenant_id)
charge = calculate_charge(data_volume, tenant_config['rate'])

Create a DLQ (Dead Letter Queue) topic for failed extractions. This prevents revenue leakage while you investigate root causes.

IoT Rules Engine SQL Syntax Best Practices:

  • Always use explicit type casting for numeric fields (cast(data_volume as Decimal))
  • Add the device identifier from the topic path using topic() function for audit trails
  • Use IS NOT NULL instead of > 0 to catch extraction failures early
  • Enable CloudWatch Logs for your rule to see actual SQL execution results

For the $12K revenue leakage, query your CloudWatch Logs Insights with:


fields @timestamp, tenant_id, data_volume
| filter tenant_id = 'UNKNOWN' or tenant_id is null
| stats count() by bin(5m)

This shows when extractions failed. Cross-reference with your device deployment history to identify which firmware versions need updates.

Finally, implement idempotency in your billing Lambda using DynamoDB to track processed message IDs. This prevents double-charging when you reprocess the missed messages. Set up a separate reconciliation job that runs daily to compare IoT Core message counts against billing records - any discrepancy triggers alerts.

The key is treating billing as a critical path with proper error handling, monitoring, and reconciliation rather than assuming the happy path always works.

I’ve seen this before. The wildcard in your topic filter ‘device/+/telemetry’ might be causing issues if your message payload structure isn’t consistent across all devices. Check if the tenant_id field is always present at the same JSON path level. Sometimes devices send nested objects and the SQL SELECT doesn’t handle null gracefully. Try adding explicit null checks in your WHERE clause or use get() function with a default value.

Check your IoT Core metrics in CloudWatch - specifically RulesExecuted vs ActionsFailed. If you’re seeing a high ActionsFailed count, the issue is in the Lambda invocation, not the SQL extraction. Also, make sure your Lambda has adequate concurrency limits. We had throttling issues during peak hours that caused intermittent billing failures until we configured reserved concurrency for our billing Lambda.