Automated invoice extraction using RPA bots for accounts payable workflow

We implemented an automated invoice extraction solution using RPA bots integrated with Creatio 8.4 for our accounts payable workflow. The goal was to eliminate manual data entry and reduce invoice processing time by 80%.

Our process receives 2,000+ invoices daily via email in various formats (PDF, scanned images, Excel). Before automation, AP staff manually entered invoice details into Creatio - a time-consuming and error-prone process. We deployed RPA bots with OCR capabilities to extract invoice fields (vendor, amount, line items, dates) and automatically create records in Creatio.

The implementation involved RPA bot integration with our email system, OCR field extraction for unstructured invoice formats, and sophisticated exception handling for invoices that couldn’t be processed automatically. The bots now handle 85% of invoices without human intervention, routing only complex cases or extraction failures to manual review.

Processing time dropped from 15 minutes per invoice to under 2 minutes, and accuracy improved from 92% to 98%. The ROI was achieved in 4 months through reduced labor costs and faster payment cycles.

The multi-pass OCR approach is smart. Did you encounter any performance bottlenecks when processing 2,000 invoices daily? How many concurrent bot instances are you running, and how did you optimize the RPA bot integration with Creatio’s API to avoid rate limiting or connection issues? We’re planning a similar volume and want to size our infrastructure correctly.

Great questions, Emily. We set an 85% confidence threshold for automatic processing. Below that, invoices route to a manual review queue in Creatio where AP staff can verify and correct the extracted data. The bot does attempt two extraction passes with different OCR engines if the first attempt is below 70% confidence - this improved our straight-through processing rate by about 12%.

For validation, we have three levels: field-level validation (format checks), business rule validation (amounts within expected ranges), and PO matching for purchase order-based invoices. About 40% of our invoices have POs, and we do three-way matching automatically (invoice, PO, receiving document). Discrepancies above 5% trigger manual review.

How did you handle exception scenarios where the OCR confidence was low? Did you implement a confidence threshold that routes invoices to manual review, or do the bots attempt multiple extraction passes? Also curious about your validation rules - do you cross-check extracted amounts against PO data before creating the Creatio records?

I’d also like to understand your exception handling strategy better. When invoices fail OCR extraction completely, how does the workflow transition back to manual processing? Is there a feedback loop where corrected data improves the OCR model over time? And what about handling duplicate invoices - did you implement duplicate detection logic in the RPA bot or rely on Creatio’s duplicate management?

Let me provide a comprehensive overview of the implementation covering all the technical aspects.

RPA Bot Integration Architecture: We deployed 8 concurrent UiPath bot instances running on dedicated VMs to handle the 2,000 daily invoice volume. Each bot can process 15-20 invoices per hour, giving us capacity for peak loads. The RPA bot integration with Creatio uses the REST API with OAuth2 authentication for secure communication.

To avoid API rate limiting, we implemented a queue-based processing pattern. Bots fetch invoices from an email monitoring queue in batches of 50, process them locally (OCR extraction happens on the bot VM), then submit the extracted data to Creatio in bulk API calls. This reduced API calls by 75% compared to individual invoice submissions. We also configured connection pooling and retry logic with exponential backoff to handle temporary API unavailability.

The integration flow: Email monitor bot → Invoice queue (database) → Processing bots (OCR extraction) → Validation service → Creatio API submission → Exception queue for failures. This architecture allows us to scale bot instances independently and provides resilience through queue persistence.

OCR Field Extraction Approach: The OCR field extraction uses a three-tier strategy based on invoice complexity:

  1. Template matching (60% of invoices): For known vendor formats, we use coordinate-based field extraction. The bot identifies the vendor from header text, loads the corresponding template, and extracts fields from predefined coordinates. This is fastest and most accurate for structured invoices.

  2. AI-powered extraction (25% of invoices): For variable formats, we use UiPath Document Understanding with a custom-trained ML model. The model identifies field labels (“Invoice Date:”, “Total Amount:”, etc.) and extracts adjacent values regardless of position. Training on 5,000 historical invoices gave us 94% accuracy on variable formats.

  3. Hybrid extraction (15% of invoices): For complex or partially structured invoices, we combine template matching for header fields with AI extraction for line items. This handles invoices that have standard headers but variable table structures.

For each extraction, we capture confidence scores per field. Fields below 85% confidence are flagged for manual verification even if the overall invoice is processed automatically. This granular confidence tracking improved our accuracy significantly.

Exception Handling Strategy: Exception handling operates at three levels:

Level 1 - Extraction Failures: If OCR confidence is below 70%, the bot attempts a second pass using an alternative OCR engine (we use both UiPath OCR and Google Vision API). If the second pass also fails, the invoice routes to the manual review queue in Creatio with the best-effort extraction results pre-filled. AP staff can see what the bot extracted and correct errors, which is faster than starting from scratch.

Level 2 - Validation Failures: After successful extraction, invoices go through validation rules (amount format, date validity, vendor exists in master data, PO matching for PO-based invoices). Validation failures create exception cases in Creatio with specific error descriptions. For example, “PO amount mismatch: Invoice $5,200, PO $5,000 (4% variance)” allows quick resolution.

Level 3 - Processing Failures: API errors, network issues, or system unavailability trigger automatic retry with exponential backoff (3 retries over 30 minutes). After retry exhaustion, invoices move to a technical exception queue monitored by the RPA support team. This separates technical issues from business exceptions.

We implemented a feedback loop for continuous improvement. When AP staff correct OCR errors in manual review, the corrections are logged with the original invoice image. Monthly, we retrain the AI model using these corrections as additional training data. This improved our straight-through processing rate from 78% at launch to 85% after 6 months.

For duplicate detection, we implemented it at the RPA layer before Creatio submission. The bot calculates a hash from vendor ID, invoice number, and amount, then checks against a duplicate cache (Redis) and Creatio’s existing invoices. This prevents duplicate API calls and reduces load on Creatio’s duplicate management system.

The most valuable lesson was building comprehensive monitoring. We track 15+ metrics in real-time: invoices processed per hour, OCR confidence distribution, exception rate by category, API response times, bot utilization, and processing cost per invoice. This visibility allowed us to identify bottlenecks quickly and continuously optimize the solution.

The 80% faster processing and 98% accuracy weren’t achieved immediately - they resulted from 3 months of iterative refinement based on production data and user feedback. The key was starting with a solid architecture that could accommodate improvements without major rework.