Services & Resources / Wolfram Forums / MathGroup Archive
-----

MathGroup Archive 2007

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Matching string in Mathematica

  • To: mathgroup at smc.vnet.net
  • Subject: [mg83528] Re: Matching string in Mathematica
  • From: congruentialuminaire at yahoo.com
  • Date: Thu, 22 Nov 2007 04:47:49 -0500 (EST)
  • References: <fhu6tp$6uj$1@smc.vnet.net>

Hello Mark:

Here is a brief "design sketch" outlining how I would approach this
problem...

Although the solution to this problem is related to string matching, I
would start with a different approach, cluster analysis. This can be
leveraged using another (new in V6) function, namely FindCluster.

There are lots of options to use for the DistanceFunction-> and an
exploratory approach would be needed that is tailored to your
application.

Of course, there is also a need of a measure to pick the "center" of
each cluster.

Then you can map each record to its respective center.

Then you can determine which string match/replacement methods to use
to "clean up" the data.

Finally, my impression is that lots of people want to keep some
"misspellings" (i.e. Kirsten vs. Kristen).

HTH.

Regards..Roger W.

On Nov 20, 12:46 am, "Coleman, Mark" <Mark.Cole... at LibertyMutual.com>
wrote:
> Greetings
>
> I've got a large-sized file with a list of names (a couple of million
> records). I'd like to organize this data by individual name, so that I
> <snipped/>


  • Prev by Date: Re: Solving Tanh[x]=Tanh[a]Tanh[b x + c]
  • Next by Date: Re: Split window vertically or horizontally in Math6.0?
  • Previous by thread: Matching string in Mathematica
  • Next by thread: Re: Matching string in Mathematica