Duplicate bug detection not working in defect-tracking module after bulk import

After performing a bulk import of 2,500 defects into our ALM 24 defect-tracking module, the duplicate bug detection has stopped working. We’re now seeing obvious duplicates being created that should have been flagged and consolidated.

The fuzzy matching configuration appears unchanged:


defect.duplicate.threshold=85
defect.matching.fields=summary,description
defect.similarity.algorithm=levenshtein

Before the bulk import, the system would catch duplicates with 85%+ similarity. Now identical defects with the same summary are being created as separate entries. The bulk import validation completed without errors, but something seems to have broken the duplicate detection logic. Has anyone experienced data quality degradation after large imports?

Look at the duplicate detection service logs specifically. After bulk imports, the service sometimes gets overloaded and switches to a degraded mode where it only checks exact matches instead of fuzzy matching. You might see log entries about threshold adjustments or algorithm fallbacks that explain why the Levenshtein similarity isn’t being calculated anymore.

I’ve seen this happen when the similarity index needs rebuilding. Large bulk imports can corrupt the index used for fuzzy matching. Try running the index rebuild utility from the admin console. It takes a while with 2,500+ defects, but it should restore duplicate detection functionality.

I’ve diagnosed this exact issue before. Here’s what happened and how to fix it:

Root Cause Analysis: Bulk imports of large defect sets (>2000 records) can overwhelm the duplicate detection service, causing it to enter a protective mode. The system logs show this as a threshold adjustment, but what actually happens is more significant.

Fuzzy Matching Configuration: Your configuration looks correct, but after bulk imports, the system temporarily increases the threshold to prevent false positives during the import flood:


defect.duplicate.threshold=85  // Your setting
defect.duplicate.threshold.active=95  // Actual runtime value

Check the active threshold value in the admin console under Defect Tracking > Detection Settings. If it’s higher than 85, that explains why obvious duplicates aren’t being caught.

Bulk Import Validation: The validation completing without errors is misleading. The bulk import process disables fuzzy matching during the import itself (for performance), which means:

  • Duplicates within the imported batch aren’t detected
  • The similarity index becomes fragmented
  • Post-import duplicate detection uses the fragmented index

Duplicate Consolidation: Here’s the fix process:

  1. Stop the duplicate detection service temporarily
  2. Reset the active threshold to match your configuration
  3. Rebuild the similarity index with full re-indexing:

alm-admin reindex --module=defects --mode=full --algorithm=levenshtein
  1. Run a retroactive duplicate scan on the imported defects:

alm-admin scan-duplicates --date-range=2025-10-18:2025-10-30 --action=flag

This will identify duplicates created during and after the import without auto-merging them.

Defect Lifecycle Management: Configure the duplicate detection to handle all workflow states, not just New defects. Add this to your configuration:


defect.duplicate.check.states=New,Open,InProgress,Reopen
defect.duplicate.merge.states=New,Reopen

This ensures detection works across states but only allows automatic merging for New and Reopened defects (to prevent data loss).

Prevention for Future Imports: For bulk imports exceeding 1000 defects, use the staged import mode:


alm-import --file=defects.csv --mode=staged --batch-size=500 --enable-duplicate-check

This imports in smaller batches with duplicate detection enabled between batches, preventing the issue from recurring.

After following these steps, your duplicate detection should return to normal operation within 2-4 hours as the index stabilizes.

Check if the bulk import bypassed the duplicate detection service. When imports are done in batch mode, they sometimes skip validation steps for performance reasons. You might need to run a post-import duplicate consolidation process to catch the duplicates that were created during the import.

Another thing to verify - was the bulk import done with a specific user account or service account? Some accounts have permissions that bypass duplicate checking intentionally, which can affect subsequent operations if that account remains the active context for the defect module.

Ran the index rebuild overnight, but duplicate detection still isn’t working. I’m noticing that defects created before the bulk import are being matched correctly, but anything created after the import (whether manual or bulk) doesn’t trigger duplicate detection. Could the import have changed some system-level configuration?

Check the defect lifecycle management settings. If the bulk import included defects in various workflow states, it might have triggered a safeguard that disables duplicate detection for non-New status defects. This is a common issue when importing historical defects that are already closed or in progress.