Quality inspection batch processing delays with large lot sizes

We’re experiencing significant delays in our quality inspection batch processing when dealing with large lot sizes (500+ units). The system uses custom sampling plan logic that queries historical data to determine inspection frequency, but the SQL queries are taking 15-20 seconds to complete during peak production hours.

Our batch job scheduling runs every 5 minutes, but when lots exceed 300 units, the inspection queue backs up and creates bottlenecks on the production line. Operators are waiting 3-4 minutes between lot completions just for the system to assign inspection tasks.

Current query structure:

SELECT i.*, s.sampling_rate FROM inspections i
JOIN sampling_plans s ON i.plan_id = s.id
WHERE i.lot_size > 500 AND i.status = 'pending'

Is there a better approach to optimize the custom sampling plan logic and batch job scheduling to handle high-volume scenarios? We’re on FT 10.0.

Your batch job frequency might be too aggressive for the workload. Running every 5 minutes with queries taking 10+ seconds means you’re constantly churning. Consider adjusting the batch scheduler to run every 10-15 minutes during peak hours, but process larger batches more efficiently. Also, look into partitioning your inspections table by date or status - that 2M record table is definitely hurting performance. Archive completed inspections older than 90 days to a separate table.

We solved almost this exact problem last year on FT 10.0. You need to tackle all three focus areas systematically to see real improvement.

Custom Sampling Plan Logic Optimization: First, implement a cached sampling plan lookup table that refreshes hourly instead of calculating on every query. Create a materialized view or dedicated cache table:

CREATE TABLE sampling_cache AS
SELECT plan_id, lot_size_range, sampling_rate, last_updated
FROM sampling_plans

Update this during off-peak hours (2-4 AM) so real-time queries just read cached values.

SQL Query Optimization: Your current query is missing critical indexes and pulling unnecessary data. Rewrite it to be more selective:

SELECT i.id, i.lot_id, sc.sampling_rate
FROM inspections i
INNER JOIN sampling_cache sc ON i.plan_id = sc.plan_id
WHERE i.status = 'pending' AND i.lot_size BETWEEN sc.lot_size_min AND sc.lot_size_max

Add composite indexes on (status, lot_size) and partition the inspections table by creation_date. Archive records older than 6 months to reduce table size from 2M to under 500K active records.

Batch Job Scheduling: Adjust your scheduler based on production load patterns. Instead of fixed 5-minute intervals, implement dynamic scheduling:

  • Peak hours (6 AM - 2 PM): Run every 10 minutes with batch size limit of 200 lots
  • Normal hours (2 PM - 10 PM): Run every 7 minutes with batch size 300
  • Off-peak (10 PM - 6 AM): Run every 15 minutes, no batch limit

Use batch commits for inspection assignments - group 50-100 assignments per transaction instead of individual commits. This reduced our processing time from 18 seconds per batch to 2-3 seconds.

Implement asynchronous inspection task creation using FT’s job queue system. When a large lot completes, queue the inspection assignment as a background job rather than blocking the completion transaction.

Results we achieved:

  • Query time: 15s → 1.2s average
  • Batch processing: 3-4 min → 30-45 seconds
  • Queue backup: Eliminated during peak hours
  • Operator wait time: 3-4 min → under 1 minute

The key is addressing all three areas together. Optimizing just the SQL won’t help much if your batch scheduling is still inefficient, and caching won’t matter if you’re still doing individual commits. Implement these changes in a test environment first and monitor with FT’s performance metrics dashboard.

Adding to the caching suggestion - also consider implementing batch commits for inspection assignments rather than individual commits. If you’re assigning 500 inspection tasks one at a time, that’s 500 database round trips. Batch them into groups of 50-100 and commit once per group. This can dramatically reduce transaction overhead and improve throughput during peak periods.

Good point on the indexing. I ran EXPLAIN and confirmed we’re doing a full table scan on inspections. The table has about 2 million historical records. I added an index on lot_size and status, which helped a bit, but we’re still seeing 8-10 second delays. The sampling plan logic itself seems to be the main culprit - it’s recalculating sampling rates on every query instead of caching results.

The custom sampling plan logic should definitely be cached. We implemented a similar optimization where sampling rates are pre-calculated during off-peak hours and stored in a lookup table. This reduced our query time from 12 seconds to under 1 second. The batch scheduler then just reads from the cached values instead of recalculating every time.