I’ve diagnosed this exact issue before. Here’s what happened and how to fix it:
Root Cause Analysis:
Bulk imports of large defect sets (>2000 records) can overwhelm the duplicate detection service, causing it to enter a protective mode. The system logs show this as a threshold adjustment, but what actually happens is more significant.
Fuzzy Matching Configuration:
Your configuration looks correct, but after bulk imports, the system temporarily increases the threshold to prevent false positives during the import flood:
defect.duplicate.threshold=85 // Your setting
defect.duplicate.threshold.active=95 // Actual runtime value
Check the active threshold value in the admin console under Defect Tracking > Detection Settings. If it’s higher than 85, that explains why obvious duplicates aren’t being caught.
Bulk Import Validation:
The validation completing without errors is misleading. The bulk import process disables fuzzy matching during the import itself (for performance), which means:
- Duplicates within the imported batch aren’t detected
- The similarity index becomes fragmented
- Post-import duplicate detection uses the fragmented index
Duplicate Consolidation:
Here’s the fix process:
- Stop the duplicate detection service temporarily
- Reset the active threshold to match your configuration
- Rebuild the similarity index with full re-indexing:
alm-admin reindex --module=defects --mode=full --algorithm=levenshtein
- Run a retroactive duplicate scan on the imported defects:
alm-admin scan-duplicates --date-range=2025-10-18:2025-10-30 --action=flag
This will identify duplicates created during and after the import without auto-merging them.
Defect Lifecycle Management:
Configure the duplicate detection to handle all workflow states, not just New defects. Add this to your configuration:
defect.duplicate.check.states=New,Open,InProgress,Reopen
defect.duplicate.merge.states=New,Reopen
This ensures detection works across states but only allows automatic merging for New and Reopened defects (to prevent data loss).
Prevention for Future Imports:
For bulk imports exceeding 1000 defects, use the staged import mode:
alm-import --file=defects.csv --mode=staged --batch-size=500 --enable-duplicate-check
This imports in smaller batches with duplicate detection enabled between batches, preventing the issue from recurring.
After following these steps, your duplicate detection should return to normal operation within 2-4 hours as the index stabilizes.