CRM deduplication rules that work

Most CRM databases are 20-30% duplicates after 2 years. Built-in dedup catches the obvious; the rest needs rules.

CRM deduplication rules that work

After 18-24 months, most CRMs accumulate 20-30% duplicate contacts and accounts. Built-in dedup tools (HubSpot, Salesforce, etc.) catch exact matches. The real duplicates are subtler.

CRM deduplication rules that work
Average duplicate rate in CRM databases after 2 years.

How duplicates form:

  • Phone number variants: +1-555-1234 vs (555) 1234 vs 5551234.
  • Email case: John@example.com vs john@example.com.
  • Name variations: "Robert Smith" vs "Bob Smith" vs "R. Smith".
  • Company suffixes: "ACME Corp" vs "ACME Corp." vs "ACME Corporation".
  • Imports stacking on existing records.
  • Web forms, chat widgets, integrations creating separate records without matching.

Rules that work:

  • Normalize on save. Lowercase emails. Standardize phone formats (E.164). Strip company suffixes and trailing punctuation. Compare normalized forms.
  • Fuzzy match for names. Use Levenshtein distance or similar. "Smith" vs "Smyth" within 1 edit = potential match.
  • Multi-field scoring. Don't dedup on one field. Email + phone + name + company → score. High score = auto-merge, medium = flag for review.
  • Never auto-merge without backup. Always keep original records for 30 days post-merge.

Tools: built-in dedup is fine for exact duplicates. For fuzzy: PieSync (now HubSpot), CleanData, or custom scripts on exported data.