MathGroup Archive 2002

[Date Index] [Thread Index] [Author Index]

Search the Archive

RE: RE: Why is my Implementation of Sorted Trees So Slow?

  • To: mathgroup at smc.vnet.net
  • Subject: [mg36532] RE: [mg36502] RE: [mg36485] Why is my Implementation of Sorted Trees So Slow?
  • From: "DrBob" <drbob at bigfoot.com>
  • Date: Wed, 11 Sep 2002 03:27:53 -0400 (EDT)
  • Reply-to: <drbob at bigfoot.com>
  • Sender: owner-wri-mathgroup at wolfram.com

Johannes,

I wasn't trying to write a truly efficient algorithm -- I consciously
tried not to change the algorithm (or even the storage method) a lot, so
that Husain could move along from where he was.  Of course a heap would
be more efficient.

In fact, given the built-in Sort and Ordering functions, I see no reason
to build sorted binary trees in Mathematica at all (as I said in my
first post) -- except as a step in the process of learning.

Your solution (on my machine) is equivalent in speed to Mihajlo's simple
modification of Husain's original -- just by replacing Null with
anything else -- and using "" instead of {} in your code makes no
difference.

The following code is Husain's, but uses List rather than node and {}
rather than Null, and builds precisely the same tree as your code.  Yet
it's 10% slower than your code.  I think that's simply because there are
more rules -- the same reason my code is slower, because I have even
more rules, so that more time is spent on pattern-matching.

Clear[add]
add[{}, x_Real] := {x, {}, {}}
add[{}, {x_Real, lower_, higher_}] = {x, lower, higher};
add[{x_Real, lower_, higher_}, y_Real] := If[x > y, {x, add[
    lower, y], higher}, {x, lower, add[higher, y]}]
add[{x_Real, lowerx_, higherx_}, {
      y_Real, lowery_, highery_}] := If[x > y, {x, add[lowerx, {y, 
    lowery, highery}], higherx}, {x, lowerx, add[higherx, {y, 
    lowery, highery}]}]

In fact, using List instead of node slowed the algorithm compared to
Mihajlo's solution, by the same 10%.  I think that's because it requires
more pattern matching, to distinguish {} from other List forms in the
tree at each call.

As for the inefficiency inherent in copying trees, how would you avoid
that, other than using something like a heap?  Even with a heap, it has
to be a global variable (not a function argument) to avoid copying.

Bobby

-----Original Message-----
From: Johannes Ludsteck
To: mathgroup at smc.vnet.net
[mailto:johannes.ludsteck at wiwi.uni-regensburg.de] 
Subject: [mg36532] Re: [mg36502] RE: [mg36485] Why is my Implementation of Sorted
Trees So Slow?


Dear DrBob,
I fear that your diagnosis of the efficiency problems
in Husain's binary tree implementation is not correct.
Furthermore your code is inefficient and awkward.

Consider the following implementation using list to represent
the tree. It is three lines long and is three times faster
than yours on my computer though it uses twice as much
memory as your representation. 

myadd[{},x_]:={x,{},{}};
myadd[{v_,a_,b_},x_]:=If[v>x,
   {v,myadd[a,x],b}, {v,a,myadd[b,x]}]

t={}; Timing[Do[t=myadd[t,Random[]],{5000}];]
{1.98 Second,Null}

The real problem with the implementation is probably that
the tree structure has to be 'copied' each time when
add is called. You have to write
tree = add[tree, x]
If you wrote
t2 = add[t1, x]
then you have *two* trees t1 and t2 which differ by the
one element inserted by add.
To the best of my knowledge, the internal representation
of t1 and t2 in Mathematica is a tree too. But in principle
the user code is duplicated in Mathematica.

A further problem of all direct implementations is that they
cannot be compiled (since they use pattern matching and
the tree is a nested structure).

Thus I guess that the efficient way to implement
tree data structures is the indirect one using heaps
(i.e. a combination of contiguous arrays with length known
in advance and hashing).

Best regards,
	Johannes Ludsteck





On 9 Sep 2002, at 0:29, DrBob wrote:

> You're not doing anything dumb as far as I can see, but you're using
far
> more memory than necessary.  Here are statistics for your algorithm on
> my machine:
> 
> SeedRandom[5];
> nums = Null
> Do[nums = add[nums, Random[]], {5000}]; // Timing
> nums // LeafCount
> nums // ByteCount
> Count[nums, Null, Infinity]
> Count[nums, node[_, Null, _], Infinity]
> Count[nums, node[_, _, Null], Infinity]
> Count[nums, node[_, Null, Null], Infinity]
> Count[nums, node[___], Infinity]
> Count[nums, node[_, _node | _?NumericQ, _node | _?NumericQ], Infinity]
> Depth[nums]
> 
> {4.406000000000001*Second,   Null}
> 15001
> 240000
> 5001 (* Null values in the tree *)
> 2458 (* nodes without left offspring *)
> 2543 (* nodes without right offspring *)
> 1669 (* leaf nodes *)
> 4999 (* total nodes *)
> 1667 (* nodes with both left and right offspring *)
> 28   (* minus one, makes 27 levels in the tree *)
> 
> It's clearly not ideal to store 5001 Nulls for 5000 actual values!
> 
> Here's a method that doesn't change the algorithm much, but changes
the
> storage method a great deal:
> 
> First of all, I'm lazy, so I'll redefine <,>,<=,>=, etc. to make the
> code simpler:
> 
> Unprotect[Less, Greater, LessEqual, GreaterEqual];
> Less[a : _node ..] := Less @@ (First /@ {a})
> LessEqual[a : _node ..] := LessEqual @@ (First /@ {a})
> Greater[a : _node ..] := Greater @@ (First /@ {a})
> GreaterEqual[a : _node ..] := GreaterEqual @@ (First /@ {a})
> Protect[Less, Greater, LessEqual, GreaterEqual];
> 
> Again because I'm lazy, I'll define "node" so that I don't have to
> consciously avoid leaf nodes and unnecessary nesting:
> 
> ClearAll[node]
> node[a___, node[b_], c___] = node[a, b, c];
> node[node[a__]] = node[a];
> 
> Here's my "add" function:
> 
> Clear[add]
> add[Null, x_] = node[x];
> add[x_, (y_)?NumericQ] := add[node[x],   node[y]]
> add[(x_)?NumericQ, y_] := add[node[x],   node[y]]
> add[node[x_], node[y_]] :=  If[x > y, node[x, y], node[y, x]] (* <--
> THERE! *)
> add[node[x_, lower_], y_node] :=
>   Which[y >= node[x], node[x, lower, y],
>    y <= node[lower], node[lower, y, x], True,   node[y, lower, x]]
> add[node[x_, lower_, higher_], y_node] /;
>    node[y] <= node[x] :=  node[x, add[lower, y], higher]
> add[node[x_, lower_, higher_], y_node] :=  node[x, lower, add[higher,
> y]]
> 
> Where I have a pointer is the code that prevents adding
right-offspring
> to a leaf node.  That avoids nodes like your node[a,Null,c].  When you
> would have node[a,b,Null], I have node[a,b].  When you would have
> node[a,Null,Null], I use a itself.
> 
> Timing and space requirements are much better now:
> 
> SeedRandom[5];
> nums = Null
> Timing[Do[nums = add[nums, Random[]], {5000}]; ]
> 
> {2.109*Second, Null}
> 
> LeafCount[nums]
> ByteCount[nums]
> Count[nums, Null, Infinity]
> Count[nums, node[_], Infinity]
> Count[nums, node[_, _],  Infinity]
> Count[nums, node[_, _, _],  Infinity]
> Count[nums, node[___], Infinity]
> Depth[nums]
> 
> 7852   (* 48% fewer "leaf" expressions,
>           mostly due to Nulls and right-offspring eliminated *)
> 165624 (* 31% less storage *)
> 0    (* no Nulls, versus 5001 *)
> 0    (* no leaf nodes, versus 1669 *)
>      (* no nodes without left-offspring, versus 2458 *)
> 705  (* nodes without right offspring, versus 2543 *)
> 2146 (* nodes with both left and right offspring, versus 1667 *)
> 2851 (* total nodes *)
> 21   (* 21 tree levels versus 27 *)
> 
> The logical structure is identical to yours except that there are no
> nodes with only right-offspring.  This incidentally tends to reduce
tree
> depth.
> 
> The storage format is far different: there are no trivial (childless)
> nodes, and nodes with left-offspring but no right-offspring are stored
> without a Null on the right.
> 
> I could make this quite a bit faster, I think, if I spent time on the
> code to eliminate changes to Less, Greater, etc. and the frequent
> nesting followed by unnesting that goes on in the algorithm.  It might
> take twice as much code, however, so I like it as is.
> 
> You're probably better off storing these things in a heap, of course.
> 
> Or -- better yet -- use the built-in Sort and Ordering functions.
> 
> Bobby Treat
> 
> -----Original Message-----
> From: Husain Ali Al-Mohssen [mailto:husain at MIT.EDU]
To: mathgroup at smc.vnet.net
> Subject: [mg36532] [mg36502] [mg36485] Why is my Implementation of Sorted Trees
So Slow?
> 
> 
> Hi all,
> 
> I am trying to implement a very simple sorted tree to quickly store
some
> real numbers I need. I have written an add, delete, minimum, and pop
> (delete the lowest value) function and they seem to work ok but are
very
> slow. Let's just look @ my implementation of the add part:
> nums=Null;(*my initial blank Tree)
> In[326]:=
> Clear[add]
> 
> In[327]:=
> add[Null,x_Real]:=node[x,Null,Null]
> 
> add[Null,node[x_Real,lower_,higher_]]=node[x,lower,higher]
> 
> In[328]:=
> add[node[x_Real,lower_,higher_],y_Real]:=
>   If[x>y,node[x,add[lower,y],higher],node[x,lower,add[higher,y]]]
> 
> In[288]:=
>
add[node[x_Real,lowerx_,higherx_],node[y_Real,lowery_,highery_]]:=If[x>y
> ,
>     node[x,add[lowerx,node[y,lowery,highery]],higherx],
>     node[x,lowerx,add[higherx,node[y,lowery,highery]]]
>     ]
> 
> 
> 
> Now this is my attempt to test how fast my add works:
> 
> SeedRandom[5];
> Do[nums=add[nums,Random[]],{5000}];//Timing
> 
> Out[333]=
> {13.279 Second,Null}
> 
> (running on Mathematica 4.1 on a win2k VMWare machine running on Linux
> RH7.3
> running on an 1.4GHz Athlon with 1GB of ram).
> 
> Questions:
> 1. Is this as fast as I can get my code to run?
> 2. Am I doing something obviously stupid?
> 3. would Compiling things help?
> 
> 
> Thanks,
> Husain
> 
> 
> 



<><><><><><><><><><><><>
Johannes Ludsteck
Economics Department
University of Regensburg
Universitaetsstrasse 31
93053 Regensburg
Phone +49/0941/943-2741




  • Prev by Date: Re: RE: Why is my Implementation of Sorted Trees So Slow?
  • Next by Date: Re: Plotting with logarithmic scale?
  • Previous by thread: Re: RE: Why is my Implementation of Sorted Trees So Slow?
  • Next by thread: Re: RE: Re: Why is my Implementation of Sorted Trees So Slow?