Re: Merging lists if an element in each partially matches?
- To: mathgroup at smc.vnet.net
- Subject: [mg48733] Re: Merging lists if an element in each partially matches?
- From: astanoff_otez_ceci at yahoo.fr (astanoff)
- Date: Fri, 11 Jun 2004 23:58:59 -0400 (EDT)
- References: <cabpeo$ol2$1@smc.vnet.net>
- Sender: owner-wri-mathgroup at wolfram.com
Charles Koehler wrote:
> Hello,
> I typically need to merge separate sets of data into one list. If
> each file contains the same sample I can join them quickly and easily
> using various merge and sort routines that have been discussed here in
> the the past, such as myMatch5, etc. This is great when the sample
> name match exactly, however I am attempting to deal with sample names
> that do not match exactly. They will differ only in the length of
> name; one list may contain for example a sample name of 3_78457_5 and
> the second may only have 3_78457 or 78457_5.
> It should be possible to search the 2 data lists for columns that have
> the largest run of consecutively matching characters, and assume that
> is the correct match. Would it be possible to develop a similarity
> criteria? I can see that this type of function would very useful in
> things more important that this.
> Any suggestions greatly appreciated.
> Sincerely,
> Charles Koehler
--
Charles,
You could use a function "similar" to this one :
In[1]:= similar[alpha_String, beta_String]:=
Module[{},
charalpha = Characters[alpha];
charbeta = Characters[beta];
leng = Max[Length[charalpha],Length[charbeta]];
padalpha = PadRight[charalpha,leng];
padbeta = PadRight[charbeta,leng];
robeta = RotateLeft[padbeta,#]& /@ (Range[leng]-1);
transbeta = Transpose[{padalpha,#}]& /@ robeta;
countalpha = Count[#,First[#] == Last[#]&]& /@ transbeta;
roalpha = RotateLeft[padalpha,#]& /@ (Range[leng]-1);
transalpha = Transpose[{padbeta,#}]& /@ roalpha;
countbeta = Count[#,{x_,x_}]& /@ transalpha;
100.(Max[countalpha]+Max[countbeta])/leng
];
similar["3_78457_5","3_78457"]
Out[2]=77.7778
In[3]:=similar["3_78457_5","78457_5"]
Out[3]=77.7778
In[4]:=similar["3_78457_5","3_78457_5"]
Out[4]=100.
In[5]:=similar["3_78457_5","666"]
Out[5]=0
hth
---
va
--
0% de pub! Que du bonheur et des vrais adhérents !
Vous aussi inscrivez-vous sans plus tarder!!
Message posté à partir de http://www.gyptis.org, BBS actif depuis 1995.