Your CRM integration timeout issue requires addressing all three focus areas with a robust async pattern:
1. CRM API Callback Timeout:
The root problem is synchronous processing in a stateless environment. Azure Functions instances don’t maintain state between HTTP requests, so waiting for a callback in the same function execution is unreliable.
Implement Durable Functions with external events:
[FunctionName("RegisterDevice")]
public static async Task<HttpResponseMessage> Run(
[OrchestrationTrigger] IDurableOrchestrationContext context)
{
var deviceData = context.GetInput<DeviceRegistration>();
// Call CRM API
await context.CallActivityAsync("CallCrmApi", deviceData);
// Wait for callback with timeout
var callbackEvent = context.WaitForExternalEvent<CrmCallback>("CrmCallback");
var timeoutTask = context.CreateTimer(context.CurrentUtcDateTime.AddSeconds(45), CancellationToken.None);
var completedTask = await Task.WhenAny(callbackEvent, timeoutTask);
if (completedTask == timeoutTask) {
// Handle timeout - retry or dead letter
await context.CallActivityAsync("HandleTimeout", deviceData);
} else {
// Process callback
var callback = await callbackEvent;
await context.CallActivityAsync("UpdateDeviceTwin", callback);
}
}
The callback endpoint raises the external event:
[FunctionName("CrmCallback")]
public static async Task<IActionResult> Callback(
[HttpTrigger] HttpRequest req,
[DurableClient] IDurableOrchestrationClient client)
{
var callback = await req.ReadAsAsync<CrmCallback>();
await client.RaiseEventAsync(callback.OrchestrationId, "CrmCallback", callback);
return new OkResult();
}
2. Retry Policy Configuration:
Implement comprehensive retry and resilience patterns:
CRM API call with Polly:
var retryPolicy = Policy
.Handle<HttpRequestException>()
.Or<TaskCanceledException>()
.WaitAndRetryAsync(
retryCount: 3,
sleepDurationProvider: attempt => TimeSpan.FromSeconds(Math.Pow(2, attempt)),
onRetry: (exception, timeSpan, retryCount, context) => {
log.LogWarning($"CRM API retry {retryCount} after {timeSpan.TotalSeconds}s");
}
);
var circuitBreaker = Policy
.Handle<HttpRequestException>()
.CircuitBreakerAsync(
exceptionsAllowedBeforeBreaking: 5,
durationOfBreak: TimeSpan.FromMinutes(2),
onBreak: (exception, duration) => {
log.LogError("Circuit breaker opened - CRM API unavailable");
},
onReset: () => log.LogInformation("Circuit breaker reset")
);
var policy = Policy.WrapAsync(retryPolicy, circuitBreaker);
await policy.ExecuteAsync(async () => {
var response = await httpClient.PostAsync(crmApiUrl, content);
response.EnsureSuccessStatusCode();
});
3. Integration Log Review:
Implement comprehensive logging and monitoring:
- Log correlation ID at each step for end-to-end tracing
- Track timing: IoT Hub registration → CRM API call → Callback received → Twin update
- Monitor metrics:
- CRM API response time (P50, P95, P99)
- Callback timeout rate
- Circuit breaker state changes
- Dead letter queue depth
Use Application Insights with custom dimensions:
telemetryClient.TrackEvent("DeviceRegistration",
new Dictionary<string, string> {
{"deviceId", deviceId},
{"correlationId", correlationId},
{"stage", "CrmApiCall"},
{"crmApiResponseTime", responseTime.ToString()}
});
Create alerts for:
- CRM API response time >5 seconds (P95)
- Callback timeout rate >10%
- Circuit breaker open state
- Dead letter queue depth >50 items
Additional Resilience Patterns:
Implement fallback when CRM is unavailable:
- Store device registration in Azure Table Storage with status ‘Pending’
- Schedule background job to retry CRM integration every 15 minutes
- Send notification to operations team if integration fails for >2 hours
Validate callback authenticity:
- CRM should sign callbacks with HMAC
- Verify signature before processing callback
- Check correlation ID matches expected value
Optimize for batch operations:
- If registering multiple devices, batch CRM API calls
- Use fan-out/fan-in pattern in durable functions
- Process callbacks asynchronously without blocking
With this architecture, device registration will succeed even if CRM callbacks are delayed or fail intermittently. The durable function pattern handles timeouts gracefully, retry policies manage transient failures, and comprehensive logging enables quick diagnosis of integration issues.