Automated extraction of structured JSON from supplier invoices using VLM for supply planning integration

I wanted to share our implementation of automated invoice processing that eliminated 90% of manual data entry in our supplier collaboration workflow with Blue Yonder Luminate.

The Challenge: We receive 800-1,200 supplier invoices daily in various formats (PDF, scanned images, email attachments). Our procurement team was manually entering invoice data into Blue Yonder’s supply planning module - a process taking 3-4 minutes per invoice and prone to entry errors. With 15 people spending 30% of their time on data entry, we needed automation.

The Solution: We built an automated pipeline using Vision Language Models (VLM) to extract structured data from invoice documents and push it to Blue Yonder via API:

  1. Invoice ingestion from email and supplier portals
  2. VLM processing to extract invoice fields (vendor, PO number, line items, amounts, dates)
  3. JSON schema validation to ensure data quality
  4. Automated API integration with Blue Yonder supply-planning module

The system processes invoices in under 30 seconds with 96% accuracy, requiring human review only for edge cases.

Happy to share the complete implementation details and results:

VLM Fine-Tuning Approach: We started with GPT-4 Vision for proof-of-concept because it required no training and achieved 78% accuracy out-of-the-box. However, the API costs were prohibitive at scale (800-1,200 invoices daily × $0.03 per invoice = $8,000-10,000/month just for extraction).

We pivoted to fine-tuning an open-source Donut (Document Understanding Transformer) model. The fine-tuning process:

  1. Collected 2,000 invoice samples from our top 50 suppliers
  2. Manually annotated all fields (vendor, invoice number, dates, line items, amounts)
  3. Split data: 1,600 training, 200 validation, 200 test
  4. Fine-tuned Donut model for 20 epochs on our annotated dataset
  5. Achieved 96% field-level accuracy on test set

The fine-tuning took 3 days on 4×A100 GPUs (cloud compute cost: $800). The production model runs on a single GPU server (inference time: 2-3 seconds per invoice). This reduced our per-invoice processing cost from $0.03 to $0.002 - a 15x cost reduction.

JSON Schema Adherence: Our universal JSON schema follows this structure:

  • invoiceNumber (required)
  • vendorId (required)
  • vendorName (required)
  • invoiceDate (required, ISO 8601)
  • dueDate (optional, ISO 8601)
  • purchaseOrderNumber (optional)
  • currency (required, ISO 4217 code)
  • subtotal (required)
  • taxAmount (optional)
  • totalAmount (required)
  • lineItems (array, optional):
    • itemNumber
    • description
    • quantity
    • unitPrice
    • lineTotal

The VLM outputs raw JSON, which passes through a validation pipeline:

  1. JSON structure validation (all required fields present)
  2. Data type validation (numbers are numeric, dates are valid)
  3. Business rule validation (line totals sum to invoice total within $0.50 tolerance)
  4. Cross-reference validation (vendor ID exists in Blue Yonder)

Invoices that pass all validations (92% of total) proceed to automatic API posting. Failed validations (8%) go to human review with extracted data pre-filled.

Invoice Field Extraction Process: The VLM processes each invoice page as an image:

  1. Convert PDF to images (300 DPI)
  2. Feed images to fine-tuned Donut model
  3. Model outputs JSON with extracted fields and confidence scores
  4. For multi-page invoices, merge extracted data from all pages
  5. Apply confidence thresholds (fields with <85% confidence flagged for review)

Key challenge: Handling invoice variations. Our suppliers use 50+ different invoice templates. Fine-tuning on diverse examples taught the model to generalize across formats. For completely new formats (new suppliers), accuracy drops to 85-90% initially, then improves as we add examples to the training set.

API Integration with Blue Yonder: Our reconciliation service handles the Blue Yonder integration:

  1. Vendor lookup: Match extracted vendor name/ID to Blue Yonder supplier records using fuzzy matching
  2. PO validation: If PO number present, verify it exists and is in ‘open’ status via Blue Yonder’s purchase order API
  3. Line item matching: Match invoice line items to PO line items by part number
  4. Discrepancy detection: Flag quantity, price, or item mismatches for three-way matching
  5. Invoice creation: For matching invoices, POST to Blue Yonder’s invoice API endpoint

The API integration uses Blue Yonder’s RESTful supply-planning APIs with OAuth 2.0 authentication. We batch invoice creation requests (up to 50 invoices per API call) to minimize API overhead.

Edge cases we handle:

  • Partial deliveries: One PO, multiple invoices - we track received quantities and validate against remaining open PO balance
  • Multiple POs per invoice: Supplier combines multiple orders on one invoice - we split the invoice data and create separate records in Blue Yonder
  • No PO match: For non-PO invoices (maintenance, services), we create invoice records with manual approval workflow

Infrastructure and Performance: Production architecture:

  • Invoice ingestion service (Python/FastAPI)
  • VLM inference server (1× GPU instance, NVIDIA T4)
  • Validation and reconciliation service (Python)
  • Blue Yonder API integration layer
  • Human review queue (React web app)
  • PostgreSQL database for audit trail

Processing performance:

  • Average invoice processing time: 28 seconds end-to-end
  • VLM inference: 2-3 seconds
  • Validation and reconciliation: 5-8 seconds
  • Blue Yonder API call: 15-20 seconds
  • Throughput: 120 invoices/hour (single GPU)

We process invoices in batches throughout the day as they arrive via email or supplier portals.

ROI and Business Impact: Cost savings:

  • Eliminated 4.5 FTE positions (15 people × 30% time) = $270,000/year in labor costs
  • Reduced data entry errors from 2.3% to 0.4% = $85,000/year in error correction costs
  • Faster invoice processing improved early payment discounts capture = $45,000/year
  • Total annual savings: $400,000

Investment:

  • Initial development (3 engineers × 4 months): $180,000
  • Infrastructure (GPU server, cloud services): $24,000/year
  • Ongoing maintenance (0.5 FTE): $60,000/year
  • Total first-year cost: $264,000

Payback period: 8 months

Year 2+ ROI: 375% annually

Quality improvements:

  • Invoice processing time: 3-4 minutes → 28 seconds (86% reduction)
  • Data entry accuracy: 97.7% → 99.6%
  • Invoices processed within 24 hours: 65% → 98%
  • Procurement team satisfaction: Eliminated tedious data entry, allowing focus on supplier relationship management and exception handling

Lessons Learned:

  1. Fine-tuning is essential for production accuracy - generic VLMs aren’t sufficient for domain-specific documents
  2. Start with high-volume, standard formats to maximize ROI, then expand to edge cases
  3. Human-in-the-loop for edge cases (8%) maintains quality while automating the majority (92%)
  4. Feedback loop for continuous improvement - quarterly retraining with corrected examples improves accuracy
  5. API integration complexity often exceeds ML complexity - plan accordingly

This implementation has been transformative for our procurement operations, freeing our team to focus on strategic supplier relationships rather than manual data entry.

Can you share more details about the end-to-end architecture and ROI? I’m building a business case for similar automation and would love to understand the infrastructure requirements, processing time, and actual cost savings you achieved.

The API integration was actually the most complex part. We built a reconciliation service that:

  1. Looks up the supplier in Blue Yonder by vendor ID or name
  2. If PO number is present, validates it exists and is open
  3. Matches invoice line items to PO line items by part number
  4. Flags discrepancies (quantity, price, or item mismatches)
  5. For matching invoices, creates the invoice record via Blue Yonder’s supply-planning API
  6. For mismatches, routes to procurement for three-way matching review

We use Blue Yonder’s invoice API endpoints to create invoice headers and line items. The API is RESTful and well-documented, which made integration straightforward. The trickiest part was handling partial deliveries where one PO generates multiple invoices.

What about the Blue Yonder API integration piece? How do you map the extracted invoice data to Blue Yonder’s supply planning data model? I assume you need to match vendor IDs, validate PO numbers against existing purchase orders, and handle exceptions where the invoice doesn’t match the PO.

We used GPT-4 Vision initially for prototyping, then moved to a fine-tuned Donut model for production to reduce API costs. The fine-tuning was critical - out-of-the-box models achieved only 78% accuracy because our suppliers use non-standard invoice formats. We collected 2,000 annotated invoice samples covering our top 50 suppliers and fine-tuned the model, which boosted accuracy to 96%. For new supplier formats, we have a feedback loop where operators correct extraction errors and those corrections are added to the training set quarterly.