Advanced Settings: Semantic Match Similarity Thresholds
Redirect mapping uses three confidence zones:
1: High Confidence Match Threshold
This is the minimum cosine similarity required to treat a redirect suggestion as a Confident Match.
- ≥ 0.90 means the old and new URLs are extremely close in meaning.
- These can be auto-mapped safely in most migrations.
- Higher threshold = fewer but more accurate auto-matches.
- Lower threshold = more auto-matches but higher review risk.
Recommended Default: 0.90
2: Low Semantic Match Review Zone
Matches that fall between the Low Threshold and the High Threshold are classified as Low Semantic Matches.
- Often correct but not reliable enough to auto-map.
- Useful for surfacing candidates that need human validation.
Typical Range:
0.80–0.89 → Review recommended
Frequently seen for pages that share product families or topic themes.
Recommended Default: 0.80
3: No Reliable Match Threshold
If the highest similarity score for a URL falls below this threshold, the system labels it as No Match.
Indicates no meaningful semantic alignment was found.
These should be manually assessed or intentionally left unmapped. (to 404 or. optionally 410)
Typical Behavior:
< 0.80 → Move to manual review.
Recommended Default: 0.80
What Does the Similarity Score Mean?
This score comes from cosine similarity between multi-view embeddings (Path + Meta).
- 1.0 = identical meaning
- 0.0 = completely unrelated
The closer to 1, the stronger the match.
Higher scores come from shared slugs, strong title/H1 overlap, similar product attributes, or consistent category structures.
How to Tune Your Similarity Thresholds
Run the tool on a sample of URLs where the correct mappings are known (e.g., your test migration set). Compare different threshold combinations and measure accuracy:
1: Start with defaults:
- High Match: 0.90
- Low Match: 0.80
2: If too many URLs fall into Review Zone:
- Lower high threshold to 0.88
- Or raise low threshold to tighten the review window
3: If auto-matches contain errors:
- Increase high threshold to 0.92–0.95
- Keep low threshold unchanged
4: If almost nothing matches automatically:
- Reduce high threshold to 0.85–0.88
- Only do this when site structure or content is very inconsistent
Goal:
Balance precision (correct auto-matches) and coverage(percentage auto-mapped).

