RE: Pattern matching

*To*: mathgroup at smc.vnet.net*Subject*: [mg33954] RE: [mg33912] Pattern matching*From*: "Wolf, Hartmut" <Hartmut.Wolf at t-systems.com>*Date*: Wed, 24 Apr 2002 01:22:15 -0400 (EDT)*Sender*: owner-wri-mathgroup at wolfram.com

John, here another idea, which I first had in mind, but lost out of sight. If in your case all listData are of the same lenght and only capital letters occur, then you may fix up your problem to be solved by StringMatchQ, which presumably is faster than QMMatchQ (please try and report). So define In[27]:= listData2 = ToLowerCase[listData] Out[27]= {"18k0f3c--", "2k40gxx--", "400hgxx--", "5m00g1f--", "960kg1d--"} In[28]:= listTemplates2 = ToLowerCase[StringReplace[listTemplates, "?" -> "@"]] Out[28]= {"@@@h@@@@@", "@@@k@@@@@"} Now use In[29]:= Intersection[listData2, listTemplates2, SameTest -> StringMatchQ] Out[29]= {"400hgxx--", "960kg1d--"} -- Hartmut > -----Original Message----- > From: Wolf, Hartmut To: mathgroup at smc.vnet.net > Sent: Tuesday, April 23, 2002 5:28 PM > To: 'John Leary'; mathgroup at smc.vnet.net > Subject: [mg33954] RE: [mg33912] Pattern matching > > > > > -----Original Message----- > > From: John Leary [mailto:leary at paradise.net.nz] To: mathgroup at smc.vnet.net > > Sent: Tuesday, April 23, 2002 1:13 PM > > To: mathgroup at smc.vnet.net > > Subject: [mg33954] [mg33912] Pattern matching > > > > > > Greetings > > > > Can you help me please - there must be a simple solution to > > this problem, > > but I can't find it. > > > > From a list of character strings and a list of templates, > I need to > > produce a list of all strings that match any of the > > templates. For example: > > > > listData={"18K0F3C--" , "2K40GXX--" , "400HGXX--" , > > "5M00G1F--" , "960KG1D--"} > > listTemplates={"???H?????" , "???K?????"} > > result={"400HGXX--","960KG1D--"} > > > > In the templates, ? is a wild-card that represents a single > character. > > The data strings contain only alpha-numeric characters and > > hyphens - no > > other characters. > > There are no special requirements for the result: > > duplication and random > > order are acceptable. > > > > > > I searched the MathGroup archive and found a very useful > > function that does > > exactly what I want, but it works only on individual strings, > > not lists of > > strings (msg00051): > > > > QMMatchQ[s_String, p_String] := MatchQ[Characters[s], > > Characters[p] /. "?" > > -> _ ] > > > > > > > > I tried to use it in the following way, but the result is a > > list of the > > matching templates, not the matching strings : > > > > QMMatchQ[s_String, p_String] := MatchQ[Characters[s], > > Characters[p] /. "?" > > -> _ ] > > SetOptions[Intersection, SameTest -> (QMMatchQ[#1,#2]& )]; > > result=Intersection[listData,listTemplates] > > {"???H?????","???K?????"} > > > > > > It ought to be a small step from there to the result that I > > need, but I > > can't find a simple solution. > > > > One alternative approach would be a Do loop: > > > > b={}; > > Do[b=Append[b,Select[listData,QMMatchQ[#,listTemplates[[n]]]&] > > ],{n,1,Length[listTemplates]}] > > > > This works but seems to be very slow for large lists. In the > > real case, > > listData can be very large - up to 250,000 elements - and > the Do loop > > approach doesn't seem to be optimum. > > > > > > I would be very grateful for your help. > > > > > > Regards > > > > John Leary > > > > > > > > John, > > perhaps the most simple way to do it is: > > In[11]:= Intersection[listData, listTemplates, SameTest -> QMMatchQ] > Out[11]= {"400HGXX--", "960KG1D--"} > > But in any case for each element in listData all > listTemplates are tried until success occurs or none is left. > In your example only one test could be skipped, as is seen by > > In[17]:= c = 0; > Intersection[listData, listTemplates, > SameTest -> ((++c; QMMatchQ[#1, #2]) &)] > Out[17]= {"400HGXX--", "960KG1D--"} > > In[18]:= c > Out[18]= 9 > > There are of course other ways to do it, e.g. > > In[10]:= > Select[listData, Function[s, Or @@ (QMMatchQ[s, #] &) /@ > listTemplates]] > Out[10]= {"400HGXX--", "960KG1D--"} > > Here all templates are checked for the listData, i.e. 10 > calls to QMMatchQ. > Or observing > > In[12]:= Outer[QMMatchQ, listData, listTemplates] > Out[12]= > {{False, False}, {False, False}, {True, False}, {False, > False}, {False, True}} > > we might get the idea > > In[16]:= MapThread[ > If[#1, #2, Unevaluated[Sequence[]]] &, > {Or @@@ Outer[QMMatchQ, listData, listTemplates], listData}] > Out[16]= {"400HGXX--", "960KG1D--"} > > but this contains additional list operations to make it even > less performant. > > -- > Hartmut >