CRM deduplication rules that work
Most CRM databases are 20-30% duplicates after 2 years. Built-in dedup catches the obvious; the rest needs rules.
After 18-24 months, most CRMs accumulate 20-30% duplicate contacts and accounts. Built-in dedup tools (HubSpot, Salesforce, etc.) catch exact matches. The real duplicates are subtler.
How duplicates form:
- Phone number variants: +1-555-1234 vs (555) 1234 vs 5551234.
- Email case: John@example.com vs john@example.com.
- Name variations: "Robert Smith" vs "Bob Smith" vs "R. Smith".
- Company suffixes: "ACME Corp" vs "ACME Corp." vs "ACME Corporation".
- Imports stacking on existing records.
- Web forms, chat widgets, integrations creating separate records without matching.
Rules that work:
- Normalize on save. Lowercase emails. Standardize phone formats (E.164). Strip company suffixes and trailing punctuation. Compare normalized forms.
- Fuzzy match for names. Use Levenshtein distance or similar. "Smith" vs "Smyth" within 1 edit = potential match.
- Multi-field scoring. Don't dedup on one field. Email + phone + name + company → score. High score = auto-merge, medium = flag for review.
- Never auto-merge without backup. Always keep original records for 30 days post-merge.
Tools: built-in dedup is fine for exact duplicates. For fuzzy: PieSync (now HubSpot), CleanData, or custom scripts on exported data.