Re: Merging lists if an element in each partially matches?
- To: mathgroup at smc.vnet.net
- Subject: [mg48733] Re: Merging lists if an element in each partially matches?
- From: astanoff_otez_ceci at yahoo.fr (astanoff)
- Date: Fri, 11 Jun 2004 23:58:59 -0400 (EDT)
- References: <cabpeo$ol2$1@smc.vnet.net>
- Sender: owner-wri-mathgroup at wolfram.com
Charles Koehler wrote: > Hello, > I typically need to merge separate sets of data into one list. If > each file contains the same sample I can join them quickly and easily > using various merge and sort routines that have been discussed here in > the the past, such as myMatch5, etc. This is great when the sample > name match exactly, however I am attempting to deal with sample names > that do not match exactly. They will differ only in the length of > name; one list may contain for example a sample name of 3_78457_5 and > the second may only have 3_78457 or 78457_5. > It should be possible to search the 2 data lists for columns that have > the largest run of consecutively matching characters, and assume that > is the correct match. Would it be possible to develop a similarity > criteria? I can see that this type of function would very useful in > things more important that this. > Any suggestions greatly appreciated. > Sincerely, > Charles Koehler -- Charles, You could use a function "similar" to this one : In[1]:= similar[alpha_String, beta_String]:= Module[{}, charalpha = Characters[alpha]; charbeta = Characters[beta]; leng = Max[Length[charalpha],Length[charbeta]]; padalpha = PadRight[charalpha,leng]; padbeta = PadRight[charbeta,leng]; robeta = RotateLeft[padbeta,#]& /@ (Range[leng]-1); transbeta = Transpose[{padalpha,#}]& /@ robeta; countalpha = Count[#,First[#] == Last[#]&]& /@ transbeta; roalpha = RotateLeft[padalpha,#]& /@ (Range[leng]-1); transalpha = Transpose[{padbeta,#}]& /@ roalpha; countbeta = Count[#,{x_,x_}]& /@ transalpha; 100.(Max[countalpha]+Max[countbeta])/leng ]; similar["3_78457_5","3_78457"] Out[2]=77.7778 In[3]:=similar["3_78457_5","78457_5"] Out[3]=77.7778 In[4]:=similar["3_78457_5","3_78457_5"] Out[4]=100. In[5]:=similar["3_78457_5","666"] Out[5]=0 hth --- va -- 0% de pub! Que du bonheur et des vrais adhérents ! Vous aussi inscrivez-vous sans plus tarder!! Message posté à partir de http://www.gyptis.org, BBS actif depuis 1995.