Re: Pattern matching
- To: mathgroup at smc.vnet.net
- Subject: [mg34001] Re: [mg33912] Pattern matching
- From: John Leary <leary at paradise.net.nz>
- Date: Fri, 26 Apr 2002 03:27:22 -0400 (EDT)
- Sender: owner-wri-mathgroup at wolfram.com
Thank you all very much. As you pointed out, the form that I was searching for was something like: QMMatchQ[s_String,{p__String}]:=Or @@(MatchQ[Characters[s],Characters[#]/."?"->_]& /@ {p}); Then I can make the comparison of lists without a Do-loop. Mr Hartmut asked about relative timings: I checked, and there seems to be little difference between the speed of Select[listData,QMMatchQ[#,listTemplates]&] and Cases[listData,_?(QMMatchQ[#,listTemplates]&)] for any list length. The Do-loop is about as good as the others for small lists, but rapidly falls behind as the list sizes increase. The Intersection function seems to have different implementation in different versions of Mathematica, so I couldn't test the Intersection approach. I think it would be considerably faster than the others. Again, many thanks for your help. Best regards John Leary At 23:04 23/04/2002 -0400, BobHanlon at aol.com wrote: >In a message dated 4/23/02 9:39:43 AM, leary at paradise.net.nz writes: > > >Can you help me please - there must be a simple solution to this problem, > > > >but I can't find it. > > > > From a list of character strings and a list of templates, I need to > >produce a list of all strings that match any of the templates. For example: > > > >listData={"18K0F3C--" , "2K40GXX--" , "400HGXX--" , "5M00G1F--" , >"960KG1D--"} > >listTemplates={"???H?????" , "???K?????"} > >result={"400HGXX--","960KG1D--"} > > > >In the templates, ? is a wild-card that represents a single character. > >The data strings contain only alpha-numeric characters and hyphens - no > > > >other characters. > >There are no special requirements for the result: duplication and random > > > >order are acceptable. > > > > > >I searched the MathGroup archive and found a very useful function that > >does > >exactly what I want, but it works only on individual strings, not lists > >of > >strings (msg00051): > > > >QMMatchQ[s_String, p_String] := MatchQ[Characters[s], Characters[p] /. > >"?" > >-> _ ] > > > > > > > >I tried to use it in the following way, but the result is a list of the > > > >matching templates, not the matching strings : > > > >QMMatchQ[s_String, p_String] := MatchQ[Characters[s], Characters[p] /. > >"?" > >-> _ ] > >SetOptions[Intersection, SameTest -> (QMMatchQ[#1,#2]& )]; > >result=Intersection[listData,listTemplates] > >{"???H?????","???K?????"} > > > > > >It ought to be a small step from there to the result that I need, but I > > > >can't find a simple solution. > > > >One alternative approach would be a Do loop: > > > >b={}; > >Do[b=Append[b,Select[listData,QMMatchQ[#,listTemplates[[n]]]&]],{n,1,Leng > th[l > >istTemplates]}] > > > >This works but seems to be very slow for large lists. In the real case, > > > >listData can be very large - up to 250,000 elements - and the Do loop > >approach doesn't seem to be optimum. > >listData={"18K0F3C--","2K40GXX--", > "400HGXX--","5M00G1F--","960KG1D--"}; > >listTemplates={"???H?????","???K?????"}; > >Clear[QMMatchQ]; > >QMMatchQ[s_String,{p__String}]:=Or @@ > > (MatchQ[Characters[s], > Characters[#]/."?"->_]& /@ {p}); > >Select[listData, > > QMMatchQ[#,listTemplates]&] > >{"400HGXX--", "960KG1D--"} > >or > >Cases[listData, > > _?(QMMatchQ[#,listTemplates]&)] > >{"400HGXX--", "960KG1D--"} > > >Bob Hanlon >Chantilly, VA USA