RE: Pattern matching

*To*: mathgroup at smc.vnet.net*Subject*: [mg33950] RE: [mg33912] Pattern matching*From*: "Wolf, Hartmut" <Hartmut.Wolf at t-systems.com>*Date*: Wed, 24 Apr 2002 01:22:05 -0400 (EDT)*Sender*: owner-wri-mathgroup at wolfram.com

> -----Original Message----- > From: John Leary [mailto:leary at paradise.net.nz] To: mathgroup at smc.vnet.net > Sent: Tuesday, April 23, 2002 1:13 PM > To: mathgroup at smc.vnet.net > Subject: [mg33950] [mg33912] Pattern matching > > > Greetings > > Can you help me please - there must be a simple solution to > this problem, > but I can't find it. > > From a list of character strings and a list of templates, I need to > produce a list of all strings that match any of the > templates. For example: > > listData={"18K0F3C--" , "2K40GXX--" , "400HGXX--" , > "5M00G1F--" , "960KG1D--"} > listTemplates={"???H?????" , "???K?????"} > result={"400HGXX--","960KG1D--"} > > In the templates, ? is a wild-card that represents a single character. > The data strings contain only alpha-numeric characters and > hyphens - no > other characters. > There are no special requirements for the result: > duplication and random > order are acceptable. > > > I searched the MathGroup archive and found a very useful > function that does > exactly what I want, but it works only on individual strings, > not lists of > strings (msg00051): > > QMMatchQ[s_String, p_String] := MatchQ[Characters[s], > Characters[p] /. "?" > -> _ ] > > > > I tried to use it in the following way, but the result is a > list of the > matching templates, not the matching strings : > > QMMatchQ[s_String, p_String] := MatchQ[Characters[s], > Characters[p] /. "?" > -> _ ] > SetOptions[Intersection, SameTest -> (QMMatchQ[#1,#2]& )]; > result=Intersection[listData,listTemplates] > {"???H?????","???K?????"} > > > It ought to be a small step from there to the result that I > need, but I > can't find a simple solution. > > One alternative approach would be a Do loop: > > b={}; > Do[b=Append[b,Select[listData,QMMatchQ[#,listTemplates[[n]]]&] > ],{n,1,Length[listTemplates]}] > > This works but seems to be very slow for large lists. In the > real case, > listData can be very large - up to 250,000 elements - and the Do loop > approach doesn't seem to be optimum. > > > I would be very grateful for your help. > > > Regards > > John Leary > > > John, perhaps the most simple way to do it is: In[11]:= Intersection[listData, listTemplates, SameTest -> QMMatchQ] Out[11]= {"400HGXX--", "960KG1D--"} But in any case for each element in listData all listTemplates are tried until success occurs or none is left. In your example only one test could be skipped, as is seen by In[17]:= c = 0; Intersection[listData, listTemplates, SameTest -> ((++c; QMMatchQ[#1, #2]) &)] Out[17]= {"400HGXX--", "960KG1D--"} In[18]:= c Out[18]= 9 There are of course other ways to do it, e.g. In[10]:= Select[listData, Function[s, Or @@ (QMMatchQ[s, #] &) /@ listTemplates]] Out[10]= {"400HGXX--", "960KG1D--"} Here all templates are checked for the listData, i.e. 10 calls to QMMatchQ. Or observing In[12]:= Outer[QMMatchQ, listData, listTemplates] Out[12]= {{False, False}, {False, False}, {True, False}, {False, False}, {False, True}} we might get the idea In[16]:= MapThread[ If[#1, #2, Unevaluated[Sequence[]]] &, {Or @@@ Outer[QMMatchQ, listData, listTemplates], listData}] Out[16]= {"400HGXX--", "960KG1D--"} but this contains additional list operations to make it even less performant. -- Hartmut