Automated supplier contract JSON extraction and supply planning dashboard integration reduced manual data entry by 85%

markcode · January 19, 2025, 8:44am

We implemented an automated system to extract supplier contract terms from PDF documents and feed them into our supply planning dashboards in IS 2022.2. Previously, our team manually reviewed 50-80 supplier contracts monthly to update lead times, minimum order quantities, and pricing tiers in the planning system.

The solution combines OCR with vision-language models to extract structured data, maps it to a standardized JSON schema, validates against business rules, and pushes updates via API to our supply planning dashboards. Processing time dropped from 2-3 hours per contract to under 5 minutes, with 94% accuracy on first pass. Here’s how we built it and the challenges we overcame.

daniel_boss · February 4, 2025, 2:57am

Great question. We built a comprehensive JSON schema with validation rules that normalize the variations. The VLM prompt includes examples of how to map different terminologies to standard fields. Then we have a validation layer that checks data types, ranges, and business logic. For instance, lead times must be 1-180 days, MOQ must be positive, pricing tiers must be ascending. If validation fails, the contract goes to a manual review queue where a user can correct and resubmit.

karenapi · January 28, 2025, 6:18am

How do you handle validation? I imagine suppliers use different terminology - some say “delivery time” others say “lead time,” some specify MOQ in units, others in dollars. How did you standardize that for the supply planning system?

ninacode · February 7, 2025, 4:47pm

Let me walk through our complete implementation covering all the key components:

OCR and VLM Integration: We use a two-stage approach:

// Stage 1: PDF to image conversion

pdfToImages(contract.pdf, {dpi: 300})

// Stage 2: VLM extraction with structured prompt

const prompt = `Extract supplier terms as JSON:

{leadTime, moq, pricing, paymentTerms}
vlmExtract(images, prompt)

The VLM handles complex layouts, multi-column text, and table extraction far better than pure OCR. We achieve 94% accuracy on first pass vs 67% with Tesseract OCR.

JSON Schema Mapping: Our standardized schema normalizes supplier variations:

{

  "supplierId": "SUP-12345",

  "leadTimeDays": 21,

  "moqUnits": 500,

  "pricingTiers": [

    {"quantity": 500, "unitPrice": 12.50},

    {"quantity": 1000, "unitPrice": 11.75}

  ],

  "paymentTerms": "NET30"

}

The VLM is prompted to map variations (“delivery time”, “turnaround”, “lead time”) to the standard “leadTimeDays” field. We provide 10-15 example mappings in the prompt to guide the model.

Data Validation Rules: Multi-layer validation catches errors before API submission:

Type validation: leadTimeDays must be integer, prices must be decimal
Range validation: leadTime 1-180 days, MOQ > 0, prices > 0
Business logic: pricing tiers must be ascending by quantity
Referential integrity: supplierId must exist in master data

Validation failure rate is about 6%, mostly due to ambiguous contract language. Failed validations route to manual review queue.

API Integration: We use Infor’s standard supply planning APIs with custom error handling:

POST /api/v1/suppliers/{id}/terms

// Payload: validated JSON schema

// Response: confirmation or detailed error

The integration includes retry logic for transient failures and detailed logging for audit trails. We batch updates during off-peak hours to minimize impact on planning operations.

Error Handling: Comprehensive error handling at each stage:

PDF corruption: Alert procurement team for manual processing
VLM extraction confidence < 85%: Route to manual review
Validation failure: Queue for correction with highlighted issues
API failure: Retry with exponential backoff, alert if persistent

We maintain a manual review queue that typically has 6-8% of contracts requiring human intervention. Reviewers see the extracted data side-by-side with the original PDF and can correct any errors before submission.

Results: After 6 months in production:

Processing time: 2-3 hours → 5 minutes per contract
Accuracy: 94% fully automated, 6% requiring minor corrections
ROI: System paid for itself in 4 months through time savings
Dashboard data freshness: Updated within hours vs weeks
Planning accuracy improved: Better lead time data = 12% reduction in stockouts

The key success factors were: (1) using VLM instead of pure OCR for better accuracy, (2) comprehensive validation to catch errors early, (3) keeping humans in the loop for edge cases, and (4) robust error handling throughout the pipeline.

naveen_data · January 23, 2025, 9:11pm

We actually moved away from pure OCR to a vision-language model approach. Using a VLM (similar to GPT-4 Vision), we feed the PDF pages as images with specific prompts asking for contract terms in JSON format. The VLM understands table structure and context much better than traditional OCR. For example, it correctly interprets “Net 30” as payment terms and “2-3 weeks” as lead time, even when formatting varies across supplier contracts.

kavya_sage · February 4, 2025, 10:23pm

What about the API integration with IS 2022.2? Did you use standard Infor APIs or did you have to build custom endpoints? We’re planning something similar and want to understand the integration architecture.

Topic		Replies	Views
Automated document-to-JSON conversion in warehouse management accelerated inbound receiving and reduced manual entry Blue Yonder Luminate use-case , data-migration , warehouse-mgmt , json , vlm , by-2023-2 , ai-pipeline , unstructured-data , document-processing	6	0	April 30, 2025
Automated extraction of structured JSON from supplier invoices using VLM for supply planning integration Blue Yonder Luminate use-case , api-development , supply-planning , json , machine-learning , invoice-automation , data-extraction , vlm , by-2022-2	5	1	August 31, 2025
Automated extraction of structured JSON from multipage invoices using Swift VLM in Oracle Fusion Cloud SCM Oracle Fusion Cloud SCM use-case , automation , ofc-24a , json , invoice-processing , cloud-hybrid-deployment , distribution-mgmt , swift-vlm , data-extraction	6	1	March 20, 2025
Automated purchase order JSON extraction from supplier emails reduces manual data entry by 85% Manhattan Active Supply Chain use-case , api-integration , procurement , process-automation , json , workflow-automation , data-validation , reporting-dashboards , document-extraction	3	0	April 29, 2025
Automated supplier forecast upload in supply-planning improves accuracy Infor CloudSuite use-case , configuration , automation , supply-planning , data-integration , ics-2022 , forecast-accuracy , import-job , manual-errors	7	0	September 1, 2025
Fine-tuned VLM extracts BOM data from scanned manufacturing drawings Blue Yonder Luminate use-case , data-modeling , json , machine-learning , manufacturing-plan , vlm , by-2023-2 , document-extraction , bom-automation	5	0	March 26, 2025
Automated contract data extraction from PDF forms using RPA OutSystems use-case , compliance , rpa-integration , rpa , data-extraction , forms-management , outsystems-11 , pdf-processing , contract-automation	6	0	October 6, 2025
Cloud-based batch invoice automation slashes processing time OutSystems use-case , cloud-deploy , process-automation , erp-integration , process-optimization , batch-processing , invoice-automation , real-time-tracking , outsystems	5	0	December 29, 2025
Automated invoice matching between accounts payable and procurement using REST API reducing processing time by 80% Infor CloudSuite use-case , integration , cloud-deploy , rest-api , fin-accounting , ics-2021 , workflow-automation , invoice-matching , accounts-payable	4	0	December 3, 2025

Automated supplier contract JSON extraction and supply planning dashboard integration reduced manual data entry by 85%

Related topics