Workflow task stuck in 'In Progress' when RPA bot fails and exception not caught

We’re experiencing an issue where workflow tasks remain stuck in ‘In Progress’ state when our RPA bot fails during execution. The bot connects via the RPA connector module to process approval requests, but when it encounters errors (network timeout, data validation failure), the workflow doesn’t transition back to a recoverable state.

Our current microflow setup:


CALL RPA_Connector.ExecuteBot(taskId, botConfig)
IF $result/success = false THEN
  LOG 'Bot execution failed'
END

The problem is the workflow state transition doesn’t trigger when the bot fails, leaving tasks orphaned. This blocks downstream approvals and requires manual intervention. We’re on Mendix 9.18 with the latest RPA connector. Has anyone configured proper exception handling for RPA bot failures in workflow scenarios?

Chen brings up a good point about the Workflow Commons module. Mike, make sure you’re using the proper workflow state transition actions rather than just updating the task entity directly. Direct entity updates won’t trigger the workflow engine’s state machine properly.

I’ve seen this before. The issue is your exception microflow isn’t properly configured to handle workflow state rollback. When the RPA bot fails, you need to explicitly call the workflow engine to transition the task back to an assignable state. Just logging the error won’t update the workflow context.

Here’s a comprehensive solution addressing all three aspects: exception microflow configuration, workflow state transitions, and RPA bot error propagation.

1. Exception Microflow Configuration: Create a dedicated error handling microflow that wraps your RPA bot execution:


TRY
  CALL RPA_Connector.ExecuteBot($Task, $BotConfig)
  CALL WorkflowCommons.CompleteTask($Task)
CATCH
  CALL WorkflowCommons.SetTaskStatus($Task, 'Failed')
  CALL System.CreateLogMessage('RPA bot failed: ' + $Error)
  COMMIT $Task
END

2. Workflow State Transitions: The critical piece is using WorkflowCommons.SetTaskStatus rather than direct entity updates. This ensures the workflow engine’s state machine is properly notified. Configure your workflow to have explicit failure paths that allow retry or escalation. In your workflow definition, add a conditional split after the RPA task that checks task status and routes to either the next step or a retry/escalation path.

3. RPA Bot Error Propagation: Implement a heartbeat mechanism in your RPA connector configuration:

  • Set bot execution timeout to realistic value (e.g., 300 seconds)
  • Configure the connector to return explicit failure status on timeout
  • Add a scheduled event that runs every 2 minutes checking for tasks in ‘In Progress’ state longer than expected duration

Scheduled event microflow:


RETRIEVE WorkflowTask WHERE Status='InProgress'
  AND StartTime < [%CurrentDateTime%] - 300 seconds
FOR EACH $StalledTask
  CALL WorkflowCommons.SetTaskStatus($StalledTask, 'Failed')
  CALL NotificationService.SendAlert('Task timeout', $StalledTask)
END

Additional Configuration: In your RPA connector settings, enable ‘Propagate Exceptions’ and set ‘Return Detailed Error Info’ to true. This ensures error context from the bot is passed back to Mendix properly.

Create a configuration entity to store expected duration per bot type, allowing your timeout detection to be intelligent about what constitutes a ‘stuck’ task versus a legitimately long-running operation.

This approach handles all three failure scenarios: bot returns error, bot times out, and bot crashes without response. The workflow will always transition to a recoverable state, preventing orphaned tasks.

For proper RPA error propagation, you need to wrap your bot execution in a try-catch pattern and ensure the workflow context is updated regardless of outcome. The key is using the Workflow Commons module’s state management actions. When a bot fails, you should transition the task to a ‘Failed’ status that allows reassignment or retry. We track expected duration per bot type in a configuration entity, then flag any execution exceeding 150% of expected time as potentially stuck.