Charles Koehler wrote: > Hello, > I typically need to merge separate sets of data into one list. If > each file contains the same sample I can join them quickly and easily > using various merge and sort routines that have been discussed here in > the the past, such as myMatch5, etc. This is great when the sample > name match exactly, however I am attempting to deal with sample names > that do not match exactly. They will differ only in the length of > name; one list may contain for example a sample name of 3_78457_5 and > the second may only have 3_78457 or 78457_5. > It should be possible to search the 2 data lists for columns that have > the largest run of consecutively matching characters, and assume that > is the correct match. Would it be possible to develop a similarity > criteria? I can see that this type of function would very useful in > things more important that this. > Any suggestions greatly appreciated. > Sincerely, > Charles Koehler -- Charles, You could use a function "similar" to this one : In[1]:= similar[alpha_String, beta_String]:= Module[{}, charalpha = Characters[alpha]; charbeta = Characters[beta]; leng = Max[Length[charalpha],Length[charbeta]]; padalpha = PadRight[charalpha,leng]; padbeta = PadRight[charbeta,leng]; robeta = RotateLeft[padbeta,#]& /@ (Range[leng]-1); transbeta = Transpose[{padalpha,#}]& /@ robeta; countalpha = Count[#,First[#] == Last[#]&]& /@ transbeta; roalpha = RotateLeft[padalpha,#]& /@ (Range[leng]-1); transalpha = Transpose[{padbeta,#}]& /@ roalpha; countbeta = Count[#,{x_,x_}]& /@ transalpha; 100.(Max[countalpha]+Max[countbeta])/leng ]; similar["3_78457_5","3_78457"] Out[2]=77.7778 In[3]:=similar["3_78457_5","78457_5"] Out[3]=77.7778 In[4]:=similar["3_78457_5","3_78457_5"] Out[4]=100. In[5]:=similar["3_78457_5","666"] Out[5]=0