RE: RE: Why is my Implementation of Sorted Trees So Slow?
- To: mathgroup at smc.vnet.net
- Subject: [mg36532] RE: [mg36502] RE: [mg36485] Why is my Implementation of Sorted Trees So Slow?
- From: "DrBob" <drbob at bigfoot.com>
- Date: Wed, 11 Sep 2002 03:27:53 -0400 (EDT)
- Reply-to: <drbob at bigfoot.com>
- Sender: owner-wri-mathgroup at wolfram.com
Johannes, I wasn't trying to write a truly efficient algorithm -- I consciously tried not to change the algorithm (or even the storage method) a lot, so that Husain could move along from where he was. Of course a heap would be more efficient. In fact, given the built-in Sort and Ordering functions, I see no reason to build sorted binary trees in Mathematica at all (as I said in my first post) -- except as a step in the process of learning. Your solution (on my machine) is equivalent in speed to Mihajlo's simple modification of Husain's original -- just by replacing Null with anything else -- and using "" instead of {} in your code makes no difference. The following code is Husain's, but uses List rather than node and {} rather than Null, and builds precisely the same tree as your code. Yet it's 10% slower than your code. I think that's simply because there are more rules -- the same reason my code is slower, because I have even more rules, so that more time is spent on pattern-matching. Clear[add] add[{}, x_Real] := {x, {}, {}} add[{}, {x_Real, lower_, higher_}] = {x, lower, higher}; add[{x_Real, lower_, higher_}, y_Real] := If[x > y, {x, add[ lower, y], higher}, {x, lower, add[higher, y]}] add[{x_Real, lowerx_, higherx_}, { y_Real, lowery_, highery_}] := If[x > y, {x, add[lowerx, {y, lowery, highery}], higherx}, {x, lowerx, add[higherx, {y, lowery, highery}]}] In fact, using List instead of node slowed the algorithm compared to Mihajlo's solution, by the same 10%. I think that's because it requires more pattern matching, to distinguish {} from other List forms in the tree at each call. As for the inefficiency inherent in copying trees, how would you avoid that, other than using something like a heap? Even with a heap, it has to be a global variable (not a function argument) to avoid copying. Bobby -----Original Message----- From: Johannes Ludsteck To: mathgroup at smc.vnet.net [mailto:johannes.ludsteck at wiwi.uni-regensburg.de] Subject: [mg36532] Re: [mg36502] RE: [mg36485] Why is my Implementation of Sorted Trees So Slow? Dear DrBob, I fear that your diagnosis of the efficiency problems in Husain's binary tree implementation is not correct. Furthermore your code is inefficient and awkward. Consider the following implementation using list to represent the tree. It is three lines long and is three times faster than yours on my computer though it uses twice as much memory as your representation. myadd[{},x_]:={x,{},{}}; myadd[{v_,a_,b_},x_]:=If[v>x, {v,myadd[a,x],b}, {v,a,myadd[b,x]}] t={}; Timing[Do[t=myadd[t,Random[]],{5000}];] {1.98 Second,Null} The real problem with the implementation is probably that the tree structure has to be 'copied' each time when add is called. You have to write tree = add[tree, x] If you wrote t2 = add[t1, x] then you have *two* trees t1 and t2 which differ by the one element inserted by add. To the best of my knowledge, the internal representation of t1 and t2 in Mathematica is a tree too. But in principle the user code is duplicated in Mathematica. A further problem of all direct implementations is that they cannot be compiled (since they use pattern matching and the tree is a nested structure). Thus I guess that the efficient way to implement tree data structures is the indirect one using heaps (i.e. a combination of contiguous arrays with length known in advance and hashing). Best regards, Johannes Ludsteck On 9 Sep 2002, at 0:29, DrBob wrote: > You're not doing anything dumb as far as I can see, but you're using far > more memory than necessary. Here are statistics for your algorithm on > my machine: > > SeedRandom[5]; > nums = Null > Do[nums = add[nums, Random[]], {5000}]; // Timing > nums // LeafCount > nums // ByteCount > Count[nums, Null, Infinity] > Count[nums, node[_, Null, _], Infinity] > Count[nums, node[_, _, Null], Infinity] > Count[nums, node[_, Null, Null], Infinity] > Count[nums, node[___], Infinity] > Count[nums, node[_, _node | _?NumericQ, _node | _?NumericQ], Infinity] > Depth[nums] > > {4.406000000000001*Second, Null} > 15001 > 240000 > 5001 (* Null values in the tree *) > 2458 (* nodes without left offspring *) > 2543 (* nodes without right offspring *) > 1669 (* leaf nodes *) > 4999 (* total nodes *) > 1667 (* nodes with both left and right offspring *) > 28 (* minus one, makes 27 levels in the tree *) > > It's clearly not ideal to store 5001 Nulls for 5000 actual values! > > Here's a method that doesn't change the algorithm much, but changes the > storage method a great deal: > > First of all, I'm lazy, so I'll redefine <,>,<=,>=, etc. to make the > code simpler: > > Unprotect[Less, Greater, LessEqual, GreaterEqual]; > Less[a : _node ..] := Less @@ (First /@ {a}) > LessEqual[a : _node ..] := LessEqual @@ (First /@ {a}) > Greater[a : _node ..] := Greater @@ (First /@ {a}) > GreaterEqual[a : _node ..] := GreaterEqual @@ (First /@ {a}) > Protect[Less, Greater, LessEqual, GreaterEqual]; > > Again because I'm lazy, I'll define "node" so that I don't have to > consciously avoid leaf nodes and unnecessary nesting: > > ClearAll[node] > node[a___, node[b_], c___] = node[a, b, c]; > node[node[a__]] = node[a]; > > Here's my "add" function: > > Clear[add] > add[Null, x_] = node[x]; > add[x_, (y_)?NumericQ] := add[node[x], node[y]] > add[(x_)?NumericQ, y_] := add[node[x], node[y]] > add[node[x_], node[y_]] := If[x > y, node[x, y], node[y, x]] (* <-- > THERE! *) > add[node[x_, lower_], y_node] := > Which[y >= node[x], node[x, lower, y], > y <= node[lower], node[lower, y, x], True, node[y, lower, x]] > add[node[x_, lower_, higher_], y_node] /; > node[y] <= node[x] := node[x, add[lower, y], higher] > add[node[x_, lower_, higher_], y_node] := node[x, lower, add[higher, > y]] > > Where I have a pointer is the code that prevents adding right-offspring > to a leaf node. That avoids nodes like your node[a,Null,c]. When you > would have node[a,b,Null], I have node[a,b]. When you would have > node[a,Null,Null], I use a itself. > > Timing and space requirements are much better now: > > SeedRandom[5]; > nums = Null > Timing[Do[nums = add[nums, Random[]], {5000}]; ] > > {2.109*Second, Null} > > LeafCount[nums] > ByteCount[nums] > Count[nums, Null, Infinity] > Count[nums, node[_], Infinity] > Count[nums, node[_, _], Infinity] > Count[nums, node[_, _, _], Infinity] > Count[nums, node[___], Infinity] > Depth[nums] > > 7852 (* 48% fewer "leaf" expressions, > mostly due to Nulls and right-offspring eliminated *) > 165624 (* 31% less storage *) > 0 (* no Nulls, versus 5001 *) > 0 (* no leaf nodes, versus 1669 *) > (* no nodes without left-offspring, versus 2458 *) > 705 (* nodes without right offspring, versus 2543 *) > 2146 (* nodes with both left and right offspring, versus 1667 *) > 2851 (* total nodes *) > 21 (* 21 tree levels versus 27 *) > > The logical structure is identical to yours except that there are no > nodes with only right-offspring. This incidentally tends to reduce tree > depth. > > The storage format is far different: there are no trivial (childless) > nodes, and nodes with left-offspring but no right-offspring are stored > without a Null on the right. > > I could make this quite a bit faster, I think, if I spent time on the > code to eliminate changes to Less, Greater, etc. and the frequent > nesting followed by unnesting that goes on in the algorithm. It might > take twice as much code, however, so I like it as is. > > You're probably better off storing these things in a heap, of course. > > Or -- better yet -- use the built-in Sort and Ordering functions. > > Bobby Treat > > -----Original Message----- > From: Husain Ali Al-Mohssen [mailto:husain at MIT.EDU] To: mathgroup at smc.vnet.net > Subject: [mg36532] [mg36502] [mg36485] Why is my Implementation of Sorted Trees So Slow? > > > Hi all, > > I am trying to implement a very simple sorted tree to quickly store some > real numbers I need. I have written an add, delete, minimum, and pop > (delete the lowest value) function and they seem to work ok but are very > slow. Let's just look @ my implementation of the add part: > nums=Null;(*my initial blank Tree) > In[326]:= > Clear[add] > > In[327]:= > add[Null,x_Real]:=node[x,Null,Null] > > add[Null,node[x_Real,lower_,higher_]]=node[x,lower,higher] > > In[328]:= > add[node[x_Real,lower_,higher_],y_Real]:= > If[x>y,node[x,add[lower,y],higher],node[x,lower,add[higher,y]]] > > In[288]:= > add[node[x_Real,lowerx_,higherx_],node[y_Real,lowery_,highery_]]:=If[x>y > , > node[x,add[lowerx,node[y,lowery,highery]],higherx], > node[x,lowerx,add[higherx,node[y,lowery,highery]]] > ] > > > > Now this is my attempt to test how fast my add works: > > SeedRandom[5]; > Do[nums=add[nums,Random[]],{5000}];//Timing > > Out[333]= > {13.279 Second,Null} > > (running on Mathematica 4.1 on a win2k VMWare machine running on Linux > RH7.3 > running on an 1.4GHz Athlon with 1GB of ram). > > Questions: > 1. Is this as fast as I can get my code to run? > 2. Am I doing something obviously stupid? > 3. would Compiling things help? > > > Thanks, > Husain > > > <><><><><><><><><><><><> Johannes Ludsteck Economics Department University of Regensburg Universitaetsstrasse 31 93053 Regensburg Phone +49/0941/943-2741