RE: RE: Why is my Implementation of Sorted Trees So Slow?

*To*: mathgroup at smc.vnet.net*Subject*: [mg36516] RE: [mg36502] RE: [mg36485] Why is my Implementation of Sorted Trees So Slow?*From*: "DrBob" <drbob at bigfoot.com>*Date*: Tue, 10 Sep 2002 06:24:49 -0400 (EDT)*Reply-to*: <drbob at bigfoot.com>*Sender*: owner-wri-mathgroup at wolfram.com

On my machine, replacing Null with "" made Husain's code 8 times faster on the same test: 0.592 seconds versus 4.688. I suppose making the same change in my code makes no difference because there are no Nulls in the tree (except one, very briefly), or perhaps because I have fewer patterns that involve Nulls. By rearranging the order of my rules, and deleting a couple that were never used, I made my algorithm about 15% faster, getting the time down to 0.906. Mihajlo's amazing discovery beats that, of course -- it's 50% faster. I suppose what we've found is that (1) my algorith makes the code four to five times faster by reducing expression complexity, tree depth, and memory use, but it's a slower algorithm otherwise (too many rules?); and (2) Mathematica pattern matching and code execution has a real problem with Null used as if it means something. Bobby -----Original Message----- From: Mihajlo Vanevic [mailto:mvane at EUnet.yu] To: mathgroup at smc.vnet.net Subject: [mg36516] Re: [mg36502] RE: [mg36485] Why is my Implementation of Sorted Trees So Slow? Check this out: Changing Null to "NULL" (or anything else) speeds up the Hussain's code!! Clear[add]; add["NULL", x_Real] := node[x, "NULL", "NULL"]; add[node[x_Real, lower_, higher_], y_Real] /; x > y := node[x, add[lower, y], higher]; add[node[x_Real, lower_, higher_], y_Real] := node[x, lower, add[higher, y]]; nums = "NULL"; SeedRandom[5]; Do[nums = add[nums, Random[]], {5000}] // Timing Out {5.11 Second, Null} Clear[add]; add[Null, x_Real] := node[x, Null, Null]; add[node[x_Real, lower_, higher_], y_Real] /; x > y := node[x, add[lower, y], higher]; add[node[x_Real, lower_, higher_], y_Real] := node[x, lower, add[higher, y]]; nums = Null; SeedRandom[5]; Do[nums = add[nums, Random[]], {5000}] // Timing Out {41.03 Second, Null} Also: Simplify[1==Null] 1==Null doesn't evaluate to false!! Btw. changing Null to "NULL" in your code doesn't infulence the speed... Regards, Mihajlo Vanevic mvane at EUnet.yu 2002-09-09 ************************************************************** * At 2002-09-09, 00:29:00 * DrBob, drbob at bigfoot.com wrote: ************************************************************** >You're not doing anything dumb as far as I can see, but you're using far >more memory than necessary. Here are statistics for your algorithm on >my machine: > >SeedRandom[5]; >nums = Null >Do[nums = add[nums, Random[]], {5000}]; // Timing >nums // LeafCount >nums // ByteCount >Count[nums, Null, Infinity] >Count[nums, node[_, Null, _], Infinity] >Count[nums, node[_, _, Null], Infinity] >Count[nums, node[_, Null, Null], Infinity] >Count[nums, node[___], Infinity] >Count[nums, node[_, _node | _?NumericQ, _node | _?NumericQ], Infinity] >Depth[nums] > >{4.406000000000001*Second, Null} >15001 >240000 >5001 (* Null values in the tree *) >2458 (* nodes without left offspring *) >2543 (* nodes without right offspring *) >1669 (* leaf nodes *) >4999 (* total nodes *) >1667 (* nodes with both left and right offspring *) >28 (* minus one, makes 27 levels in the tree *) > >It's clearly not ideal to store 5001 Nulls for 5000 actual values! > >Here's a method that doesn't change the algorithm much, but changes the >storage method a great deal: > >First of all, I'm lazy, so I'll redefine <,>,<=,>=, etc. to make the >code simpler: > >Unprotect[Less, Greater, LessEqual, GreaterEqual]; >Less[a : _node ..] := Less @@ (First /@ {a}) >LessEqual[a : _node ..] := LessEqual @@ (First /@ {a}) >Greater[a : _node ..] := Greater @@ (First /@ {a}) >GreaterEqual[a : _node ..] := GreaterEqual @@ (First /@ {a}) >Protect[Less, Greater, LessEqual, GreaterEqual]; > >Again because I'm lazy, I'll define "node" so that I don't have to >consciously avoid leaf nodes and unnecessary nesting: > >ClearAll[node] >node[a___, node[b_], c___] = node[a, b, c]; >node[node[a__]] = node[a]; > >Here's my "add" function: > >Clear[add] >add[Null, x_] = node[x]; >add[x_, (y_)?NumericQ] := add[node[x], node[y]] >add[(x_)?NumericQ, y_] := add[node[x], node[y]] >add[node[x_], node[y_]] := If[x > y, node[x, y], node[y, x]] (* <-- >THERE! *) >add[node[x_, lower_], y_node] := > Which[y >= node[x], node[x, lower, y], > y <= node[lower], node[lower, y, x], True, node[y, lower, x]] >add[node[x_, lower_, higher_], y_node] /; > node[y] <= node[x] := node[x, add[lower, y], higher] >add[node[x_, lower_, higher_], y_node] := node[x, lower, add[higher, >y]] > >Where I have a pointer is the code that prevents adding right-offspring >to a leaf node. That avoids nodes like your node[a,Null,c]. When you >would have node[a,b,Null], I have node[a,b]. When you would have >node[a,Null,Null], I use a itself. > >Timing and space requirements are much better now: > >SeedRandom[5]; >nums = Null >Timing[Do[nums = add[nums, Random[]], {5000}]; ] > >{2.109*Second, Null} > >LeafCount[nums] >ByteCount[nums] >Count[nums, Null, Infinity] >Count[nums, node[_], Infinity] >Count[nums, node[_, _], Infinity] >Count[nums, node[_, _, _], Infinity] >Count[nums, node[___], Infinity] >Depth[nums] > >7852 (* 48 fewer "leaf" expressions, > mostly due to Nulls and right-offspring eliminated *) >165624 (* 31 less storage *) >0 (* no Nulls, versus 5001 *) >0 (* no leaf nodes, versus 1669 *) > (* no nodes without left-offspring, versus 2458 *) >705 (* nodes without right offspring, versus 2543 *) >2146 (* nodes with both left and right offspring, versus 1667 *) >2851 (* total nodes *) >21 (* 21 tree levels versus 27 *) > >The logical structure is identical to yours except that there are no >nodes with only right-offspring. This incidentally tends to reduce tree >depth. > >The storage format is far different: there are no trivial (childless) >nodes, and nodes with left-offspring but no right-offspring are stored >without a Null on the right. > >I could make this quite a bit faster, I think, if I spent time on the >code to eliminate changes to Less, Greater, etc. and the frequent >nesting followed by unnesting that goes on in the algorithm. It might >take twice as much code, however, so I like it as is. > >You're probably better off storing these things in a heap, of course. > >Or -- better yet -- use the built-in Sort and Ordering functions. > >Bobby Treat > >-----Original Message----- >From: Husain Ali Al-Mohssen [mailto:husain at MIT.EDU] To: mathgroup at smc.vnet.net >Subject: [mg36516] [mg36502] [mg36485] Why is my Implementation of Sorted Trees So Slow? > > >Hi all, > >I am trying to implement a very simple sorted tree to quickly store some >real numbers I need. I have written an add, delete, minimum, and pop >(delete the lowest value) function and they seem to work ok but are very >slow. Let's just look @ my implementation of the add part: >nums=Null;(*my initial blank Tree) >In[326]:= >Clear[add] > >In[327]:= >add[Null,x_Real]:=node[x,Null,Null] > >add[Null,node[x_Real,lower_,higher_]]=node[x,lower,higher] > >In[328]:= >add[node[x_Real,lower_,higher_],y_Real]:= > If[x>y,node[x,add[lower,y],higher],node[x,lower,add[higher,y]]] > >In[288]:= >add[node[x_Real,lowerx_,higherx_],node[y_Real,lowery_,highery_]]:=If[x> y >, > node[x,add[lowerx,node[y,lowery,highery]],higherx], > node[x,lowerx,add[higherx,node[y,lowery,highery]]] > ] > > > >Now this is my attempt to test how fast my add works: > >SeedRandom[5]; >Do[nums=add[nums,Random[]],{5000}];//Timing > >Out[333]= >{13.279 Second,Null} > >(running on Mathematica 4.1 on a win2k VMWare machine running on Linux >RH7.3 >running on an 1.4GHz Athlon with 1GB of ram). > >Questions: >1. Is this as fast as I can get my code to run? >2. Am I doing something obviously stupid? >3. would Compiling things help? > > >Thanks, >Husain **************************************************************