Re: Matching string in Mathematica
- To: mathgroup at smc.vnet.net
- Subject: [mg83528] Re: Matching string in Mathematica
- From: congruentialuminaire at yahoo.com
- Date: Thu, 22 Nov 2007 04:47:49 -0500 (EST)
- References: <fhu6tp$6uj$1@smc.vnet.net>
Hello Mark: Here is a brief "design sketch" outlining how I would approach this problem... Although the solution to this problem is related to string matching, I would start with a different approach, cluster analysis. This can be leveraged using another (new in V6) function, namely FindCluster. There are lots of options to use for the DistanceFunction-> and an exploratory approach would be needed that is tailored to your application. Of course, there is also a need of a measure to pick the "center" of each cluster. Then you can map each record to its respective center. Then you can determine which string match/replacement methods to use to "clean up" the data. Finally, my impression is that lots of people want to keep some "misspellings" (i.e. Kirsten vs. Kristen). HTH. Regards..Roger W. On Nov 20, 12:46 am, "Coleman, Mark" <Mark.Cole... at LibertyMutual.com> wrote: > Greetings > > I've got a large-sized file with a list of names (a couple of million > records). I'd like to organize this data by individual name, so that I > <snipped/>