Device registration integration fails with external CRM - callback timeout errors

When registering new devices in IoT Hub, our integration with an external CRM system fails with callback timeout errors. This breaks the device mapping workflow where we associate IoT devices with customer accounts in the CRM.

The integration flow:

  1. Device registers in IoT Hub
  2. Azure Function triggers on device creation event
  3. Function calls CRM API to create device record
  4. CRM calls back to confirm device-customer association
  5. Function updates IoT Hub device twin with CRM reference

Step 4 fails with timeout after 30 seconds:


POST /api/device-callback HTTP/1.1
Host: our-function.azurewebsites.net
Error: Request timeout after 30000ms

Our retry policy configuration allows 3 attempts with exponential backoff, but all retries timeout. The CRM API team says their callback completes in 5-10 seconds typically. We’ve checked integration logs and see the callback request arriving, but the Function appears to stop responding. Device mapping remains incomplete, blocking device onboarding. Any ideas on CRM API callback timeout handling?

That’s exactly the issue - Azure Functions scale out across multiple instances, so the callback might be routed to a different instance than the one waiting for it. You can’t rely on in-memory state across HTTP requests. Implement a durable function with external events, or use a queue/blob storage to coordinate between the initial call and the callback. Store the correlation ID in a table so any instance can handle the callback.

Makes sense. We’ll switch to durable functions. But I’m also seeing errors in the integration log about the CRM API returning 500 errors intermittently before timing out. Could the CRM API instability be causing the callback to fail even if we fix the async pattern? Should we implement circuit breaker logic to handle CRM API failures gracefully?

Azure Functions have a default timeout of 5 minutes for Consumption plan, but HTTP-triggered functions must respond within 230 seconds. If your function is waiting synchronously for the CRM callback, it will timeout. You need to implement an async pattern - have the initial function return immediately with 202 Accepted, then handle the callback in a separate function invocation. Use durable functions or a queue-based pattern.

Your CRM integration timeout issue requires addressing all three focus areas with a robust async pattern:

1. CRM API Callback Timeout:

The root problem is synchronous processing in a stateless environment. Azure Functions instances don’t maintain state between HTTP requests, so waiting for a callback in the same function execution is unreliable.

Implement Durable Functions with external events:

[FunctionName("RegisterDevice")]
public static async Task<HttpResponseMessage> Run(
    [OrchestrationTrigger] IDurableOrchestrationContext context)
{
    var deviceData = context.GetInput<DeviceRegistration>();

    // Call CRM API
    await context.CallActivityAsync("CallCrmApi", deviceData);

    // Wait for callback with timeout
    var callbackEvent = context.WaitForExternalEvent<CrmCallback>("CrmCallback");
    var timeoutTask = context.CreateTimer(context.CurrentUtcDateTime.AddSeconds(45), CancellationToken.None);

    var completedTask = await Task.WhenAny(callbackEvent, timeoutTask);

    if (completedTask == timeoutTask) {
        // Handle timeout - retry or dead letter
        await context.CallActivityAsync("HandleTimeout", deviceData);
    } else {
        // Process callback
        var callback = await callbackEvent;
        await context.CallActivityAsync("UpdateDeviceTwin", callback);
    }
}

The callback endpoint raises the external event:

[FunctionName("CrmCallback")]
public static async Task<IActionResult> Callback(
    [HttpTrigger] HttpRequest req,
    [DurableClient] IDurableOrchestrationClient client)
{
    var callback = await req.ReadAsAsync<CrmCallback>();
    await client.RaiseEventAsync(callback.OrchestrationId, "CrmCallback", callback);
    return new OkResult();
}

2. Retry Policy Configuration:

Implement comprehensive retry and resilience patterns:

CRM API call with Polly:

var retryPolicy = Policy
    .Handle<HttpRequestException>()
    .Or<TaskCanceledException>()
    .WaitAndRetryAsync(
        retryCount: 3,
        sleepDurationProvider: attempt => TimeSpan.FromSeconds(Math.Pow(2, attempt)),
        onRetry: (exception, timeSpan, retryCount, context) => {
            log.LogWarning($"CRM API retry {retryCount} after {timeSpan.TotalSeconds}s");
        }
    );

var circuitBreaker = Policy
    .Handle<HttpRequestException>()
    .CircuitBreakerAsync(
        exceptionsAllowedBeforeBreaking: 5,
        durationOfBreak: TimeSpan.FromMinutes(2),
        onBreak: (exception, duration) => {
            log.LogError("Circuit breaker opened - CRM API unavailable");
        },
        onReset: () => log.LogInformation("Circuit breaker reset")
    );

var policy = Policy.WrapAsync(retryPolicy, circuitBreaker);

await policy.ExecuteAsync(async () => {
    var response = await httpClient.PostAsync(crmApiUrl, content);
    response.EnsureSuccessStatusCode();
});

3. Integration Log Review:

Implement comprehensive logging and monitoring:

  • Log correlation ID at each step for end-to-end tracing
  • Track timing: IoT Hub registration → CRM API call → Callback received → Twin update
  • Monitor metrics:
    • CRM API response time (P50, P95, P99)
    • Callback timeout rate
    • Circuit breaker state changes
    • Dead letter queue depth

Use Application Insights with custom dimensions:

telemetryClient.TrackEvent("DeviceRegistration",
    new Dictionary<string, string> {
        {"deviceId", deviceId},
        {"correlationId", correlationId},
        {"stage", "CrmApiCall"},
        {"crmApiResponseTime", responseTime.ToString()}
    });

Create alerts for:

  • CRM API response time >5 seconds (P95)
  • Callback timeout rate >10%
  • Circuit breaker open state
  • Dead letter queue depth >50 items

Additional Resilience Patterns:

Implement fallback when CRM is unavailable:

  • Store device registration in Azure Table Storage with status ‘Pending’
  • Schedule background job to retry CRM integration every 15 minutes
  • Send notification to operations team if integration fails for >2 hours

Validate callback authenticity:

  • CRM should sign callbacks with HMAC
  • Verify signature before processing callback
  • Check correlation ID matches expected value

Optimize for batch operations:

  • If registering multiple devices, batch CRM API calls
  • Use fan-out/fan-in pattern in durable functions
  • Process callbacks asynchronously without blocking

With this architecture, device registration will succeed even if CRM callbacks are delayed or fail intermittently. The durable function pattern handles timeouts gracefully, retry policies manage transient failures, and comprehensive logging enables quick diagnosis of integration issues.

Absolutely implement circuit breaker and retry with exponential backoff. Use Polly library for .NET functions. Configure it to open the circuit after 5 consecutive failures, wait 60 seconds before retry, and gradually increase wait time. Also add fallback logic - if CRM integration fails, store the device registration in a dead letter queue for manual processing later. This prevents blocking device onboarding when CRM is down.

We’re using Consumption plan, so the 230-second limit applies. But our function should complete in under 30 seconds - we call the CRM API (2-3 seconds), then wait for callback. The callback arrives within 10 seconds according to our logs, but the function doesn’t process it. Could there be a connection pooling issue preventing the callback HTTP request from reaching the same function instance?