MathGroup Archive 2002

[Date Index] [Thread Index] [Author Index]

Search the Archive

RE: Why is my Implementation of Sorted Trees So Slow?

  • To: mathgroup at smc.vnet.net
  • Subject: [mg36505] RE: [mg36485] Why is my Implementation of Sorted Trees So Slow?
  • From: "DrBob" <drbob at bigfoot.com>
  • Date: Tue, 10 Sep 2002 06:24:32 -0400 (EDT)
  • Reply-to: <drbob at bigfoot.com>
  • Sender: owner-wri-mathgroup at wolfram.com

I made the algorithm more than twice as fast with very little change,
while eliminating modifications to <,>,<=, and >=.  Most of the
improvement, in fact, was due to eliminating those changes AFTER the
code was changed so they wouldn't be used anyway.  That's because the
comparisons appear everywhere, so adding rules to those functions really
slows down pattern matching.

Clear[add, node]
node[a___, node[b_], c___] = node[a, b, c]; 
node[node[a__]] = node[a]; 
add[Null, x_] = node[x]; 
add[x_, (y_)?NumericQ] := add[node[x], 
   node[y]]
add[(x_)?NumericQ, y_] := add[node[x], 
   node[y]]
add[node[x_], node[y_]] := 
  If[x >= y, node[x, y], node[y, x]]
add[node[x_, lower_], y_node] := 
  Which[First[y] >= x, node[x, lower, y], 
   First[y] <= lower, node[lower, y, x], 
   True, node[y, lower, x]]
add[node[x_, lower_, higher_], y_node] /; 
   First[y] <= x := node[x, add[lower, y], 
   higher]
add[node[x_, lower_, higher_], y_node] := 
  node[x, lower, add[higher, y]]

SeedRandom[5]; 
nums = Null
Timing[Do[nums = add[nums, Random[]], {5000}]; ]

{0.968*Second, Null}

None of the other stats changed.

Bobby Treat

-----Original Message-----
From: DrBob [mailto:drbob at bigfoot.com] 
To: mathgroup at smc.vnet.net
Subject: [mg36505] RE: [mg36485] Why is my Implementation of Sorted Trees So Slow?


You're not doing anything dumb as far as I can see, but you're using far
more memory than necessary.  Here are statistics for your algorithm on
my machine:

SeedRandom[5];
nums = Null
Do[nums = add[nums, Random[]], {5000}]; // Timing
nums // LeafCount
nums // ByteCount
Count[nums, Null, Infinity]
Count[nums, node[_, Null, _], Infinity]
Count[nums, node[_, _, Null], Infinity]
Count[nums, node[_, Null, Null], Infinity]
Count[nums, node[___], Infinity]
Count[nums, node[_, _node | _?NumericQ, _node | _?NumericQ], Infinity]
Depth[nums]

{4.406000000000001*Second,   Null}
15001
240000
5001 (* Null values in the tree *)
2458 (* nodes without left offspring *)
2543 (* nodes without right offspring *)
1669 (* leaf nodes *)
4999 (* total nodes *)
1667 (* nodes with both left and right offspring *)
28   (* minus one, makes 27 levels in the tree *)

It's clearly not ideal to store 5001 Nulls for 5000 actual values!

Here's a method that doesn't change the algorithm much, but changes the
storage method a great deal:

First of all, I'm lazy, so I'll redefine <,>,<=,>=, etc. to make the
code simpler:

Unprotect[Less, Greater, LessEqual, GreaterEqual];
Less[a : _node ..] := Less @@ (First /@ {a})
LessEqual[a : _node ..] := LessEqual @@ (First /@ {a})
Greater[a : _node ..] := Greater @@ (First /@ {a})
GreaterEqual[a : _node ..] := GreaterEqual @@ (First /@ {a})
Protect[Less, Greater, LessEqual, GreaterEqual];

Again because I'm lazy, I'll define "node" so that I don't have to
consciously avoid leaf nodes and unnecessary nesting:

ClearAll[node]
node[a___, node[b_], c___] = node[a, b, c];
node[node[a__]] = node[a];

Here's my "add" function:

Clear[add]
add[Null, x_] = node[x];
add[x_, (y_)?NumericQ] := add[node[x],   node[y]]
add[(x_)?NumericQ, y_] := add[node[x],   node[y]]
add[node[x_], node[y_]] :=  If[x > y, node[x, y], node[y, x]] (* <--
THERE! *)
add[node[x_, lower_], y_node] :=
  Which[y >= node[x], node[x, lower, y],
   y <= node[lower], node[lower, y, x], True,   node[y, lower, x]]
add[node[x_, lower_, higher_], y_node] /;
   node[y] <= node[x] :=  node[x, add[lower, y], higher]
add[node[x_, lower_, higher_], y_node] :=  node[x, lower, add[higher,
y]]

Where I have a pointer is the code that prevents adding right-offspring
to a leaf node.  That avoids nodes like your node[a,Null,c].  When you
would have node[a,b,Null], I have node[a,b].  When you would have
node[a,Null,Null], I use a itself.

Timing and space requirements are much better now:

SeedRandom[5];
nums = Null
Timing[Do[nums = add[nums, Random[]], {5000}]; ]

{2.109*Second, Null}

LeafCount[nums]
ByteCount[nums]
Count[nums, Null, Infinity]
Count[nums, node[_], Infinity]
Count[nums, node[_, _],  Infinity]
Count[nums, node[_, _, _],  Infinity]
Count[nums, node[___], Infinity]
Depth[nums]

7852   (* 48% fewer "leaf" expressions,
          mostly due to Nulls and right-offspring eliminated *)
165624 (* 31% less storage *)
0    (* no Nulls, versus 5001 *)
0    (* no leaf nodes, versus 1669 *)
     (* no nodes without left-offspring, versus 2458 *)
705  (* nodes without right offspring, versus 2543 *)
2146 (* nodes with both left and right offspring, versus 1667 *)
2851 (* total nodes *)
21   (* 21 tree levels versus 27 *)

The logical structure is identical to yours except that there are no
nodes with only right-offspring.  This incidentally tends to reduce tree
depth.

The storage format is far different: there are no trivial (childless)
nodes, and nodes with left-offspring but no right-offspring are stored
without a Null on the right.

I could make this quite a bit faster, I think, if I spent time on the
code to eliminate changes to Less, Greater, etc. and the frequent
nesting followed by unnesting that goes on in the algorithm.  It might
take twice as much code, however, so I like it as is.

You're probably better off storing these things in a heap, of course.

Or -- better yet -- use the built-in Sort and Ordering functions.

Bobby Treat

-----Original Message-----
From: Husain Ali Al-Mohssen [mailto:husain at MIT.EDU]
To: mathgroup at smc.vnet.net
Subject: [mg36505] [mg36485] Why is my Implementation of Sorted Trees So Slow?


Hi all,

I am trying to implement a very simple sorted tree to quickly store some
real numbers I need. I have written an add, delete, minimum, and pop
(delete the lowest value) function and they seem to work ok but are very
slow. Let's just look @ my implementation of the add part:
nums=Null;(*my initial blank Tree)
In[326]:=
Clear[add]

In[327]:=
add[Null,x_Real]:=node[x,Null,Null]

add[Null,node[x_Real,lower_,higher_]]=node[x,lower,higher]

In[328]:=
add[node[x_Real,lower_,higher_],y_Real]:=
  If[x>y,node[x,add[lower,y],higher],node[x,lower,add[higher,y]]]

In[288]:=
add[node[x_Real,lowerx_,higherx_],node[y_Real,lowery_,highery_]]:=If[x>y
,
    node[x,add[lowerx,node[y,lowery,highery]],higherx],
    node[x,lowerx,add[higherx,node[y,lowery,highery]]]
    ]



Now this is my attempt to test how fast my add works:

SeedRandom[5];
Do[nums=add[nums,Random[]],{5000}];//Timing

Out[333]=
{13.279 Second,Null}

(running on Mathematica 4.1 on a win2k VMWare machine running on Linux
RH7.3
running on an 1.4GHz Athlon with 1GB of ram).

Questions:
1. Is this as fast as I can get my code to run?
2. Am I doing something obviously stupid?
3. would Compiling things help?


Thanks,
Husain




  • Prev by Date: Error in BinCounts function?
  • Next by Date: RE: RE: RE: Generating Two Unit Orthogonal Vectors
  • Previous by thread: RE: Why is my Implementation of Sorted Trees So Slow?
  • Next by thread: RE: RE: Why is my Implementation of Sorted Trees So Slow?