Fine-tuned VLM extracts BOM data from scanned manufacturing drawings

We automated BOM extraction from scanned engineering drawings using a fine-tuned Vision Language Model integrated with Blue Yonder Luminate 2023.2. Previously, planners manually transcribed component data from PDFs and images-taking 2-3 hours per complex assembly.

Our VLM fine-tuning approach uses labeled training data with 500+ annotated manufacturing drawings. The model extracts part numbers, quantities, descriptions, and hierarchical relationships, then outputs structured JSON matching our ERP schema. We validate extracted data against Blue Yonder’s manufacturing planning module requirements before ingestion.

The system processes drawings in Swift (our custom automation layer), handles multi-page documents, and maintains 94% extraction accuracy. Integration with BY’s planning workflows reduced BOM entry time by 85% and eliminated transcription errors that previously caused material shortages.

Excellent implementation case study. Let me provide technical perspective on the three critical success factors here.

VLM Fine-Tuning Strategy: Your approach of combining real-world drawings with targeted augmentation is optimal. The 500+ base dataset with hierarchical labeling addresses the core challenge-manufacturing BOMs aren’t flat lists but structured trees. Training the model to recognize parent-child relationships through visual cues (indentation, connector lines, assembly bubbles) is what elevates this beyond simple OCR. The 40-hour training investment with multimodal transformers gives you domain-specific understanding that generic vision models lack. Consider expanding your augmentation to include different CAD software outputs (AutoCAD vs SolidWorks styles) if you handle multi-source drawings.

Document-to-JSON Extraction Pipeline: Your two-stage architecture (VLM→intermediate JSON→BY schema) is the right pattern. Direct VLM-to-ERP mapping creates brittle integrations. The intermediate format gives you flexibility to handle schema evolution in Blue Yonder updates without retraining the model. Your confidence thresholding at 85% with manual review queue balances automation efficiency with data quality-critical for manufacturing where BOM errors cascade into material procurement and production scheduling failures. The 94% overall accuracy you’re achieving exceeds industry benchmarks for document extraction (typically 85-90% for structured forms).

ERP Schema Adherence: The validation layer checking required fields, data types, and BY-specific conventions (hierarchical level encoding) prevents the silent data corruption that plagues automated integrations. Manufacturing planning modules are particularly sensitive to malformed BOMs-missing UOM fields or incorrect parent linkages break MRP calculations. Your approach of validating against Blue Yonder’s manufacturing planning API schema ensures compatibility with downstream processes like material requirements planning, capacity scheduling, and shop floor execution.

Operational Impact: 85% reduction in BOM entry time translates to significant planner productivity gains, but the elimination of transcription errors is the bigger win. Manual BOM entry errors cause material shortages (stockouts during production runs), excess inventory (over-ordering due to quantity mistakes), and schedule delays (rework when assemblies don’t match). Your 12% manual review rate for edge cases is excellent-you’ve automated the routine 88% while preserving quality controls for exceptions.

Scaling Considerations: As you expand, monitor model drift-manufacturing drawing standards evolve, and periodic retraining with new examples maintains accuracy. Consider implementing active learning where manual review corrections automatically feed back into training data. Also explore multi-language support if you operate globally; technical drawings often mix languages in annotations.

This is a strong example of applied ML in supply chain operations, demonstrating how domain-specific fine-tuning delivers practical business value beyond generic AI capabilities.

What about edge cases-handwritten annotations, poor scan quality, or non-standard drawing formats? Manufacturing floors often have legacy documents that don’t follow current CAD standards. Does your system handle these gracefully or require preprocessing?

How does your document-to-JSON extraction maintain consistency with Blue Yonder’s schema requirements? We’ve struggled with field mapping mismatches when integrating external data sources. Do you use a validation layer, or does the VLM output directly conform to BY’s expected structure?

We started with a multimodal transformer base and fine-tuned using 500 real drawings plus 200 augmented variations (rotations, quality degradation, annotation styles). Training took about 40 hours on GPU cluster. The key was labeling hierarchical BOM structures-parent assemblies, sub-assemblies, individual components-so the model learns relational context, not just isolated part numbers. We also trained it to recognize standard drawing symbols and callouts specific to our industry.

Great question. We do basic preprocessing-deskewing, contrast enhancement, noise reduction-but the VLM handles moderate quality variations well due to augmented training data. Handwritten annotations are trickier; we trained specifically on those but accuracy drops to around 78%. For legacy formats, we maintain a fallback queue where low-confidence extractions get flagged for human verification. About 12% of documents need this manual check, which is still far better than 100% manual entry we had before.