We’re designing a genealogy tracking system for our pharmaceutical manufacturing line using FactoryTalk MES 10.0. The regulatory team needs complete traceability from raw materials through finished goods, with the ability to perform forward/backward trace within 4 hours of a quality incident.
I’m evaluating two architectural approaches and would love to hear experiences from others who’ve implemented genealogy tracking at scale:
Approach A: Batch ETL
Run scheduled ETL jobs every 15 minutes to extract genealogy events from shop floor systems, transform into lineage relationships, and load into central traceability database. Simpler to implement and maintain.
Approach B: Event Streaming
Capture every material movement, assembly operation, and component consumption as real-time events via Kafka or similar message queue. Build lineage graph incrementally as events arrive. More complex but potentially lower latency.
Our volumes: 200 production orders/day, 50,000 component transactions/day, 12 production lines. What design decisions did you make and why?
For pharmaceutical compliance, I’d lean toward event streaming despite the complexity. The 4-hour trace requirement seems generous now, but when you have a contamination event, you want answers in minutes not hours. We implemented Kafka-based streaming and can now trace any batch in under 5 minutes. The real-time visibility also helps with in-process quality decisions - if a component fails inspection, we can immediately identify all WIP that consumed it.
The choice really depends on your message queue infrastructure maturity. If you already have Kafka deployed for other systems, event streaming is the way to go. But if you’d be standing up messaging infrastructure just for genealogy, that’s significant operational overhead. Consider also that batch ETL gives you natural checkpoint recovery - if a job fails, you just re-run it. With streaming, you need to handle exactly-once delivery semantics, consumer group rebalancing, offset management, etc. That said, streaming enables real-time dashboards and proactive quality monitoring that batch can’t match.
We went with batch ETL for similar volumes and it works fine for our compliance needs. The 15-minute latency is acceptable because our quality holds are typically 2-4 hours anyway before materials get released. The operational simplicity is worth it - our small IT team can manage the ETL jobs without specialized streaming expertise. We did optimize the ETL queries to run in under 3 minutes so we have a reliable 15-minute cycle.
Don’t forget about compliance reporting requirements. For FDA audits, we need to produce genealogy reports going back 3 years. Batch ETL naturally creates a stable, queryable historical record. With streaming, you need to think about long-term storage and how you’ll query the event log for historical traces. We use batch ETL specifically because it creates clean dimensional models that our quality analysts can query directly without needing streaming platform expertise.
Retention: 7 days minimum (supports replay for error recovery)
Replication factor: 3 (ensures no data loss)
Consumer group: 3-4 instances (handles peak load with redundancy)
Compliance Reporting Requirements
Regardless of approach:
Maintain immutable audit trail of all lineage data changes
Implement data retention policies (typically 3-7 years for pharma)
Design report templates for common regulatory queries (forward trace, backward trace, lot genealogy)
Validate trace accuracy with known-good test scenarios quarterly
Document your lineage calculation methodology for audit purposes
Traceability Latency Tolerance
Your 4-hour requirement is actually quite relaxed:
Batch ETL at 15-min intervals gives worst-case 15-min latency (well within requirement)
This suggests batch approach is sufficient unless you have other real-time needs
Consider if quality holds, in-process decisions, or production optimization would benefit from real-time lineage
My Recommendation for Your Scenario
Start with batch ETL because:
Your latency requirement (4 hours) is easily met with 15-min ETL cycles
Operational simplicity reduces risk for initial deployment
Your volumes (50K transactions/day) are manageable in batch windows
You avoid the complexity and cost of standing up new streaming infrastructure
Compliance reporting is naturally supported with dimensional warehouse model
Design the ETL with these future-proofing elements:
Use CDC patterns so you could later stream those same change events
Structure your lineage data model to support both batch loads and event appends
Build abstraction layer so consuming applications don’t depend on ETL implementation
This gives you a working system quickly while keeping the door open to add event streaming later if business needs evolve (e.g., real-time quality monitoring, predictive analytics).
Good points on the operational complexity trade-offs. Our site doesn’t currently have Kafka, so we’d be building that capability from scratch. What about hybrid approaches? Could we do batch ETL for the core lineage data but stream critical quality events? Or does that create consistency problems between the two data paths?
Hybrid can work but you need careful design to avoid data inconsistencies. We run batch ETL for historical lineage (older than 24 hours) and event streaming for recent activity. The key is having a clear boundary and ensuring your queries know which data source to use based on timestamp. For example, real-time trace queries hit the streaming data store for the last 24 hours, then fall back to the batch-loaded warehouse for older history. This gives you fast recent traces without the cost of streaming everything forever.