RE: Why is my Implementation of Sorted Trees So Slow?
- To: mathgroup at smc.vnet.net
- Subject: [mg36505] RE: [mg36485] Why is my Implementation of Sorted Trees So Slow?
- From: "DrBob" <drbob at bigfoot.com>
- Date: Tue, 10 Sep 2002 06:24:32 -0400 (EDT)
- Reply-to: <drbob at bigfoot.com>
- Sender: owner-wri-mathgroup at wolfram.com
I made the algorithm more than twice as fast with very little change, while eliminating modifications to <,>,<=, and >=. Most of the improvement, in fact, was due to eliminating those changes AFTER the code was changed so they wouldn't be used anyway. That's because the comparisons appear everywhere, so adding rules to those functions really slows down pattern matching. Clear[add, node] node[a___, node[b_], c___] = node[a, b, c]; node[node[a__]] = node[a]; add[Null, x_] = node[x]; add[x_, (y_)?NumericQ] := add[node[x], node[y]] add[(x_)?NumericQ, y_] := add[node[x], node[y]] add[node[x_], node[y_]] := If[x >= y, node[x, y], node[y, x]] add[node[x_, lower_], y_node] := Which[First[y] >= x, node[x, lower, y], First[y] <= lower, node[lower, y, x], True, node[y, lower, x]] add[node[x_, lower_, higher_], y_node] /; First[y] <= x := node[x, add[lower, y], higher] add[node[x_, lower_, higher_], y_node] := node[x, lower, add[higher, y]] SeedRandom[5]; nums = Null Timing[Do[nums = add[nums, Random[]], {5000}]; ] {0.968*Second, Null} None of the other stats changed. Bobby Treat -----Original Message----- From: DrBob [mailto:drbob at bigfoot.com] To: mathgroup at smc.vnet.net Subject: [mg36505] RE: [mg36485] Why is my Implementation of Sorted Trees So Slow? You're not doing anything dumb as far as I can see, but you're using far more memory than necessary. Here are statistics for your algorithm on my machine: SeedRandom[5]; nums = Null Do[nums = add[nums, Random[]], {5000}]; // Timing nums // LeafCount nums // ByteCount Count[nums, Null, Infinity] Count[nums, node[_, Null, _], Infinity] Count[nums, node[_, _, Null], Infinity] Count[nums, node[_, Null, Null], Infinity] Count[nums, node[___], Infinity] Count[nums, node[_, _node | _?NumericQ, _node | _?NumericQ], Infinity] Depth[nums] {4.406000000000001*Second, Null} 15001 240000 5001 (* Null values in the tree *) 2458 (* nodes without left offspring *) 2543 (* nodes without right offspring *) 1669 (* leaf nodes *) 4999 (* total nodes *) 1667 (* nodes with both left and right offspring *) 28 (* minus one, makes 27 levels in the tree *) It's clearly not ideal to store 5001 Nulls for 5000 actual values! Here's a method that doesn't change the algorithm much, but changes the storage method a great deal: First of all, I'm lazy, so I'll redefine <,>,<=,>=, etc. to make the code simpler: Unprotect[Less, Greater, LessEqual, GreaterEqual]; Less[a : _node ..] := Less @@ (First /@ {a}) LessEqual[a : _node ..] := LessEqual @@ (First /@ {a}) Greater[a : _node ..] := Greater @@ (First /@ {a}) GreaterEqual[a : _node ..] := GreaterEqual @@ (First /@ {a}) Protect[Less, Greater, LessEqual, GreaterEqual]; Again because I'm lazy, I'll define "node" so that I don't have to consciously avoid leaf nodes and unnecessary nesting: ClearAll[node] node[a___, node[b_], c___] = node[a, b, c]; node[node[a__]] = node[a]; Here's my "add" function: Clear[add] add[Null, x_] = node[x]; add[x_, (y_)?NumericQ] := add[node[x], node[y]] add[(x_)?NumericQ, y_] := add[node[x], node[y]] add[node[x_], node[y_]] := If[x > y, node[x, y], node[y, x]] (* <-- THERE! *) add[node[x_, lower_], y_node] := Which[y >= node[x], node[x, lower, y], y <= node[lower], node[lower, y, x], True, node[y, lower, x]] add[node[x_, lower_, higher_], y_node] /; node[y] <= node[x] := node[x, add[lower, y], higher] add[node[x_, lower_, higher_], y_node] := node[x, lower, add[higher, y]] Where I have a pointer is the code that prevents adding right-offspring to a leaf node. That avoids nodes like your node[a,Null,c]. When you would have node[a,b,Null], I have node[a,b]. When you would have node[a,Null,Null], I use a itself. Timing and space requirements are much better now: SeedRandom[5]; nums = Null Timing[Do[nums = add[nums, Random[]], {5000}]; ] {2.109*Second, Null} LeafCount[nums] ByteCount[nums] Count[nums, Null, Infinity] Count[nums, node[_], Infinity] Count[nums, node[_, _], Infinity] Count[nums, node[_, _, _], Infinity] Count[nums, node[___], Infinity] Depth[nums] 7852 (* 48% fewer "leaf" expressions, mostly due to Nulls and right-offspring eliminated *) 165624 (* 31% less storage *) 0 (* no Nulls, versus 5001 *) 0 (* no leaf nodes, versus 1669 *) (* no nodes without left-offspring, versus 2458 *) 705 (* nodes without right offspring, versus 2543 *) 2146 (* nodes with both left and right offspring, versus 1667 *) 2851 (* total nodes *) 21 (* 21 tree levels versus 27 *) The logical structure is identical to yours except that there are no nodes with only right-offspring. This incidentally tends to reduce tree depth. The storage format is far different: there are no trivial (childless) nodes, and nodes with left-offspring but no right-offspring are stored without a Null on the right. I could make this quite a bit faster, I think, if I spent time on the code to eliminate changes to Less, Greater, etc. and the frequent nesting followed by unnesting that goes on in the algorithm. It might take twice as much code, however, so I like it as is. You're probably better off storing these things in a heap, of course. Or -- better yet -- use the built-in Sort and Ordering functions. Bobby Treat -----Original Message----- From: Husain Ali Al-Mohssen [mailto:husain at MIT.EDU] To: mathgroup at smc.vnet.net Subject: [mg36505] [mg36485] Why is my Implementation of Sorted Trees So Slow? Hi all, I am trying to implement a very simple sorted tree to quickly store some real numbers I need. I have written an add, delete, minimum, and pop (delete the lowest value) function and they seem to work ok but are very slow. Let's just look @ my implementation of the add part: nums=Null;(*my initial blank Tree) In[326]:= Clear[add] In[327]:= add[Null,x_Real]:=node[x,Null,Null] add[Null,node[x_Real,lower_,higher_]]=node[x,lower,higher] In[328]:= add[node[x_Real,lower_,higher_],y_Real]:= If[x>y,node[x,add[lower,y],higher],node[x,lower,add[higher,y]]] In[288]:= add[node[x_Real,lowerx_,higherx_],node[y_Real,lowery_,highery_]]:=If[x>y , node[x,add[lowerx,node[y,lowery,highery]],higherx], node[x,lowerx,add[higherx,node[y,lowery,highery]]] ] Now this is my attempt to test how fast my add works: SeedRandom[5]; Do[nums=add[nums,Random[]],{5000}];//Timing Out[333]= {13.279 Second,Null} (running on Mathematica 4.1 on a win2k VMWare machine running on Linux RH7.3 running on an 1.4GHz Athlon with 1GB of ram). Questions: 1. Is this as fast as I can get my code to run? 2. Am I doing something obviously stupid? 3. would Compiling things help? Thanks, Husain