Automated supplier contract JSON extraction and supply planning dashboard integration reduced manual data entry by 85%

We implemented an automated system to extract supplier contract terms from PDF documents and feed them into our supply planning dashboards in IS 2022.2. Previously, our team manually reviewed 50-80 supplier contracts monthly to update lead times, minimum order quantities, and pricing tiers in the planning system.

The solution combines OCR with vision-language models to extract structured data, maps it to a standardized JSON schema, validates against business rules, and pushes updates via API to our supply planning dashboards. Processing time dropped from 2-3 hours per contract to under 5 minutes, with 94% accuracy on first pass. Here’s how we built it and the challenges we overcame.

Great question. We built a comprehensive JSON schema with validation rules that normalize the variations. The VLM prompt includes examples of how to map different terminologies to standard fields. Then we have a validation layer that checks data types, ranges, and business logic. For instance, lead times must be 1-180 days, MOQ must be positive, pricing tiers must be ascending. If validation fails, the contract goes to a manual review queue where a user can correct and resubmit.

How do you handle validation? I imagine suppliers use different terminology - some say “delivery time” others say “lead time,” some specify MOQ in units, others in dollars. How did you standardize that for the supply planning system?

Let me walk through our complete implementation covering all the key components:

OCR and VLM Integration: We use a two-stage approach:

// Stage 1: PDF to image conversion

pdfToImages(contract.pdf, {dpi: 300})

// Stage 2: VLM extraction with structured prompt

const prompt = `Extract supplier terms as JSON:

{leadTime, moq, pricing, paymentTerms}
vlmExtract(images, prompt)

The VLM handles complex layouts, multi-column text, and table extraction far better than pure OCR. We achieve 94% accuracy on first pass vs 67% with Tesseract OCR.

JSON Schema Mapping: Our standardized schema normalizes supplier variations:

{

  "supplierId": "SUP-12345",

  "leadTimeDays": 21,

  "moqUnits": 500,

  "pricingTiers": [

    {"quantity": 500, "unitPrice": 12.50},

    {"quantity": 1000, "unitPrice": 11.75}

  ],

  "paymentTerms": "NET30"

}

The VLM is prompted to map variations (“delivery time”, “turnaround”, “lead time”) to the standard “leadTimeDays” field. We provide 10-15 example mappings in the prompt to guide the model.

Data Validation Rules: Multi-layer validation catches errors before API submission:

  • Type validation: leadTimeDays must be integer, prices must be decimal
  • Range validation: leadTime 1-180 days, MOQ > 0, prices > 0
  • Business logic: pricing tiers must be ascending by quantity
  • Referential integrity: supplierId must exist in master data

Validation failure rate is about 6%, mostly due to ambiguous contract language. Failed validations route to manual review queue.

API Integration: We use Infor’s standard supply planning APIs with custom error handling:

POST /api/v1/suppliers/{id}/terms

// Payload: validated JSON schema

// Response: confirmation or detailed error

The integration includes retry logic for transient failures and detailed logging for audit trails. We batch updates during off-peak hours to minimize impact on planning operations.

Error Handling: Comprehensive error handling at each stage:

  • PDF corruption: Alert procurement team for manual processing
  • VLM extraction confidence < 85%: Route to manual review
  • Validation failure: Queue for correction with highlighted issues
  • API failure: Retry with exponential backoff, alert if persistent

We maintain a manual review queue that typically has 6-8% of contracts requiring human intervention. Reviewers see the extracted data side-by-side with the original PDF and can correct any errors before submission.

Results: After 6 months in production:

  • Processing time: 2-3 hours → 5 minutes per contract
  • Accuracy: 94% fully automated, 6% requiring minor corrections
  • ROI: System paid for itself in 4 months through time savings
  • Dashboard data freshness: Updated within hours vs weeks
  • Planning accuracy improved: Better lead time data = 12% reduction in stockouts

The key success factors were: (1) using VLM instead of pure OCR for better accuracy, (2) comprehensive validation to catch errors early, (3) keeping humans in the loop for edge cases, and (4) robust error handling throughout the pipeline.

We actually moved away from pure OCR to a vision-language model approach. Using a VLM (similar to GPT-4 Vision), we feed the PDF pages as images with specific prompts asking for contract terms in JSON format. The VLM understands table structure and context much better than traditional OCR. For example, it correctly interprets “Net 30” as payment terms and “2-3 weeks” as lead time, even when formatting varies across supplier contracts.

What about the API integration with IS 2022.2? Did you use standard Infor APIs or did you have to build custom endpoints? We’re planning something similar and want to understand the integration architecture.