MathGroup Archive: February 2000 [00046]

[Date Index] [Thread Index] [Author Index]
Re: A question about pattern matching
To: mathgroup at smc.vnet.net
Subject: [mg21871] Re: [mg21837] A question about pattern matching
From: Bojan Bistrovic <bojanb at physics.odu.edu>
Date: Wed, 2 Feb 2000 22:54:28 -0500 (EST)
Sender: owner-wri-mathgroup at wolfram.com
> Here's a question about pattern matching:
> 
> When one has to work with unknown number of arguments, it's necessary to 
> use double or triple Blanks; now, here's an example in pattern matching:
> 
> In[1]:= ReplaceList[ f[a,b,c,MyRule1->True,MyRule2->True,MyRule3->True],
>   f[aaa___, bbb___Rule] -> {{aaa},{bbb}}]
> 
> Out[1]= {{{a,b,c},{MyRule1 -> True,MyRule2 -> True,MyRule3 -> True}},
> 	{{a,b,c, MyRule1 -> True},{MyRule2 -> True,MyRule3 -> True}},
> 	{{a,b,c, MyRule1 -> True,MyRule2 -> True},{MyRule3 -> True}},
> 	{{a,b,c, MyRule1 -> True,MyRule2 -> True, MyRule3 -> True},{}}}
> 
> Now, when I started using them some time ago, it looked so logical that 
> in the example above the only match should be 
> 
> {{{a,b,c},{MyRule1 -> True,MyRule2 -> True,MyRule3 -> True}}}
> 
> that I never bothered to check the manual. After a few weeks of infinite
> loops I realized the error, so now I have to use something like
> 
> In[2]:= ReplaceList[f[a,b,c,MyRule1->True,MyRule2->True,MyRule3->True],
>   f[aaa___ /; FreeQ[{aaa},_Rule,1], bbb___Rule] -> {{aaa},{bbb}}]
> 
> Out[2]= {{{a,b,c},{MyRule1 -> True,MyRule2 -> True,MyRule3 -> True}}}
> 
> to achieve the same effect. I realize that the present behavior of 
> Mathematica isn't completely nonlogical, and that one might want this 
> behavior sometimes, but it seems to me that it would be more logical to 
> have the behavior I just presented. The question is: what do you think
> would be the more logical behavior: the one just described or the one
> that's currently implemented? I propose a vote; post your answer or email it
> directly to me and after a while, I'll post the results of the vote to 
> the group.  [Please send your responses to the author - moderator.]
> 
> Bojan
> 
First of all, let me start by thanking everybody who took the trouble to
answer my previous posting on this subject. 

I see that I haven't made my thoughts clear enough on this question, so I'll
try to explain better. First, I'm using ReplaceList just for demonstrating
purposes. The real question is how are patterns being matched. A few people
have answered this post by saying that it's perfectly logical and well
documented that if you have a set {a1,a2,a3,b1->b2,b3->b4, b5->b6} there's no
requirement that AAA___ shouldn't match objects with the head Rule. I agree,
it's perfectly legal and it's documented; it's just that from my viewpoint, 
if you're using a construct with two patterns AAA___,BBB___Rule, you WANT 
all the rules to be assigned to pattern BBB and all the other objects to 
AAA; if you wanted ALL the objects being assigned to AAA, you would simply 
use a single pattern AAA___ which would simply match everything. I have 
thought about it for a long, long time, and I simply couldn't imagine an 
example where you'd use two patterns like AAA___,BBB___Rule and WANT the 
rules to be matched by AAA as well as BBB that couldn't be handled by a 
single pattern AAA___. There's no reason to restrict this to rules (altho 
that's probably the most common usage of a triple-blank pattern at the 
end of a function); if you have a function with a (unknown) variable
number of arguments and you want to match a group of arguments with the same
head somewhere in the expression, the reason is that you want some operation
performed on them and not on anything else, so you're not likely to want some
other more general pattern to match those objects as well. I realize that this
could be just a product of my twisted mind :-), but like I said, I can't think
of an example where you'd want the present behavior regardless of it's
perfectly valid logic. The way I see it, there should be a general (global) 
option that would control this behavior even if I'm wrong and there are 
situations where you want the present behavior. For speed if for nothing else; 
here's an example: if I use something like aaa___,bbb___Rule it produces 
(Length[bbb]+1 matches; now if you want that bbb___Rule matches ALL the rules 
and aaa___ NONE, just one survives regardless of the method you use for 
discarding the unwanted matches:

In[1]:= Timing[ReplaceList[ f@@Union[Table[Random[],{kk,1000}],
        Table[Rule[Random[],Random[]],{kk,1000}]],
	f[aaa___,bbb___Rule] -> {{aaa},{bbb}}];]

Out[1]= {1.8 Second,Null}

In[2]:= Timing[ReplaceList[ f@@Union[Table[Random[],{kk,1000}],
        Table[Rule[Random[],Random[]],{kk,1000}]],
	f[aaa___/;FreeQ[{aaa},_Rule,1],bbb___Rule] -> {{aaa},{bbb}}];]

Out[2]= {1.83 Second,Null}

In[3]:= Timing[ReplaceList[ f@@Union[Table[Random[],{kk,1000}],
        Table[Rule[Random[],Random[]],{kk,1000}]],
	f[aaa___/;MemberQ[{aaa},_Rule],bbb___Rule] -> {{aaa},{bbb}}];]

Out[3]= {3.03 Second,Null}

In[4]:= Timing[ReplaceList[ f@@Union[Table[Random[],{kk,1000}],
        Table[Rule[Random[],Random[]],{kk,1000}]],
	f[aaa___?(Head[#] =!= Rule &),bbb___Rule] -> {{aaa},{bbb}}];]

Out[4]= {16.63 Second,Null}

In[5]:= Timing[ReplaceList[ f@@Union[Table[Random[],{kk,1000}],
        Table[Rule[Random[],Random[]],{kk,1000}]],
	f[aaa : (___?((Head[#] =!= Rule) &)) ,bbb___Rule] -> {{aaa},{bbb}}];]

Out[5]= {16.64 Second,Null}

In[6]:= Timing[ReplaceList[ f@@Union[Table[Random[],{kk,1000}],
        Table[Rule[Random[],Random[]],{kk,1000}]],
    f[aaa___ /;  !Or @@ OptionQ /@ {aaa},  bbb___ /; OptionQ[{bbb}]]
   :> {{aaa}, Flatten[{bbb}]}];]

Out[6]= {9.8 Second,Null}

The use of ReplaceList is totally irrelevant; it's just for this example.
Judging from the time needed to produce the result, it appears that all 
matches are still produced and then tested by FreeQ or something else. 
Obviously, FreeQ produces the fastest results; other ways of discarding the 
unwanted matches are even slower. In this example, there are 1001 matches 
and 1000 are discarded to get a single one. It looks obvious (altho it might 
not be) that if the pattern-matching mechanism stopped after finding the one 
match needed (which is by the way the first one) instead of finding other 
1000 unwanted ones, a LOT of time would be saved. I don't know how it works
internally, does Mathematica first finds the match we want or does it have
some algorithm that finds them all and then sorts them. So, the time needed to
find the wanted match might not be equal to the 1/1001 of total time, but I'm
pretty sure it's not the LAST match found which is the only case where no
time would be saved (for matching part). Even if it is, still, the time 
needed to write those unwanted matches somewhere in memory instead of just 
discarding them is still linear in number of matches. These examples are made
under Linux with 64M of memory; for REALLY large expressions, if the operating 
systems runs out of memory (because it had to write down unwanted solutions) 
and starts memory paging, it makes the system completely inoperable for all 
practical purposes. When I try changing the number of rules from 1000 ro 2000
in the above examples, I have to kill the MathKernel because it has grown over
150M of memory an the system is using 99.99% of the time for memory swapping.
As I said earlier, yes, it IS perfectly logical and perfectly legal the way it
works now, but it wastes the time and memory, and I still haven't found an
example where you HAVE to have the present behavior that couldn't be handled
by either a single pattern aaa___ or by calling some function like Partition
or something else. It appears that not many people have encountered this
problem in performance drop, but it IS a problem, and a solvable one. 

Bye, Bojan

--
-------------------------------------------------------------
Bojan Bistrovic,                       bojanb at physics.odu.edu  
Old Dominion University, Physics Department,      Norfolk, VA
-------------------------------------------------------------
Prev by Date: Re: Re: Problem with evaluation of delayed rules
Next by Date: Protein Folding
Previous by thread: Re: Re: Problem with evaluation of delayed rules
Next by thread: Re: A question about pattern matching