matching misspelled names
- To: mathgroup at smc.vnet.net
- Subject: [mg100306] matching misspelled names
- From: Jess <jesscobrien at gmail.com>
- Date: Sun, 31 May 2009 06:37:13 -0400 (EDT)
Hi, I would like to compare 2 very large lists of names to identify a shortlist of possible matches where someone from the list A appears in the list B. However as English is not the local language, the most names have many spelling alternatives. Also in different contexts, the same person is referred to by the full name with one or more middle names and family names or just by a smaller combination of these. I imagine comparing lists with one or few typos is quite simple. But is there a way to do this in Mathematica which can also handle the type of variations I've outlined? I was thinking of arranging the names into clusters, isolating those clusters which include a list A person, and then generating lists of the closest matches for each cluster around a list A person. Is there a simple way to do this or a better way? Thanks, Jess