MathGroup Archive 2004

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Merging lists if an element in each partially matches?

  • To: mathgroup at smc.vnet.net
  • Subject: [mg48733] Re: Merging lists if an element in each partially matches?
  • From: astanoff_otez_ceci at yahoo.fr (astanoff)
  • Date: Fri, 11 Jun 2004 23:58:59 -0400 (EDT)
  • References: <cabpeo$ol2$1@smc.vnet.net>
  • Sender: owner-wri-mathgroup at wolfram.com

Charles Koehler wrote:

> Hello,

> I typically need to merge separate sets of data into one list.  If
> each file contains the same sample I can join them quickly and easily
> using various merge and sort routines that have been discussed here in
> the  the past, such as myMatch5, etc.  This is great when the sample
> name match exactly, however I am attempting to deal with sample names
> that do not match exactly.  They will differ only in the length of
> name; one list may contain for example a sample name of 3_78457_5 and
> the second may only have 3_78457 or 78457_5.

> It should be possible to search the 2 data lists for columns that have
> the largest run of consecutively matching characters, and assume that
> is the correct match.  Would it be possible to develop a similarity
> criteria?  I can see that this type of function would very useful in
> things more important that this.

> Any suggestions greatly appreciated.

> Sincerely,

> Charles Koehler

--
Charles,
You could use a function "similar" to this one :
In[1]:= similar[alpha_String, beta_String]:=
Module[{},
      charalpha = Characters[alpha];
      charbeta = Characters[beta];
      leng = Max[Length[charalpha],Length[charbeta]];
      padalpha = PadRight[charalpha,leng];
      padbeta = PadRight[charbeta,leng];
      robeta = RotateLeft[padbeta,#]& /@ (Range[leng]-1);
      transbeta = Transpose[{padalpha,#}]& /@ robeta;
      countalpha = Count[#,First[#] == Last[#]&]& /@ transbeta;
      roalpha = RotateLeft[padalpha,#]& /@ (Range[leng]-1);
      transalpha = Transpose[{padbeta,#}]& /@ roalpha;
      countbeta = Count[#,{x_,x_}]& /@ transalpha;
      100.(Max[countalpha]+Max[countbeta])/leng 
      ];

similar["3_78457_5","3_78457"]

Out[2]=77.7778

In[3]:=similar["3_78457_5","78457_5"]
Out[3]=77.7778

In[4]:=similar["3_78457_5","3_78457_5"]
Out[4]=100.

In[5]:=similar["3_78457_5","666"]
Out[5]=0

hth
---
va
 
--
0% de pub! Que du bonheur et des vrais adhérents !
Vous aussi inscrivez-vous sans plus tarder!!
Message posté à partir de http://www.gyptis.org, BBS actif depuis 1995.




  • Prev by Date: Creating combinations from a group of sets, PART 2
  • Next by Date: Re: LogIntegral^(-1)
  • Previous by thread: Re: Merging lists if an element in each partially matches?
  • Next by thread: Re: Uniform design