RE: RE: Re: Why is my Implementation of Sorted Trees So Slow?

*To*: mathgroup at smc.vnet.net*Subject*: [mg36680] RE: [mg36649] RE: [mg36644] Re: [mg36485] Why is my Implementation of Sorted Trees So Slow?*From*: "DrBob" <drbob at bigfoot.com>*Date*: Wed, 18 Sep 2002 02:10:24 -0400 (EDT)*Reply-to*: <drbob at bigfoot.com>*Sender*: owner-wri-mathgroup at wolfram.com

Daniel, I'd say your algorithm (as an ALGORITHM) is precisely the same as Husain's. The implementation is different, but the logical steps are the same (except perhaps for <= vs. <), and so is the logical structure of the tree produced. Even the storage method is pretty much the same. Using {} instead of Null and List instead of 'node', and putting values in the middle position, doesn't add up to a change that could affect algorithmic complexity. Even the fact that Flatten gives a sorted List with your implementation is something you get "for free", and the advantage comes only when it's time to output a sorted list. It has nothing to do with the work of building the tree. Algorithmic complexity (absent hidden factors under the kernel's control) is precisely the same for your solution and Husain's (mine too, I think). The speed differences we're seeing have to do with nuances of the kernel, pattern matching costs, etc. Our three nearly equivalent algorithms simply incur different hidden costs. That's the case with many of the problems that turn into "speed wars" among us. We compare implementation speed (including hidden costs), not pure algorithmic complexity. Compiled code or packed arrays often make all the difference, and it usually happens in the background where we can't see it. As you've hinted below, the vagaries of Update can make a linear algorithm quadratic, but may NOT so plague a slightly different implementation of the very same algorithm. So... I'm suggesting we might want to be careful when we leap to conclusions about "algorithmic complexity" based on a few timed experiments. Bobby Treat -----Original Message----- From: danl at wolfram.com [mailto:danl at wolfram.com] To: mathgroup at smc.vnet.net Subject: [mg36680] Re: [mg36649] RE: [mg36644] Re: [mg36485] Why is my Implementation of Sorted Trees So Slow? DrBob wrote: > > This code is almost identical to yours in principle, yet 30% faster: > > (compared with the code in your notebook, not the code in the post > below) > > ClearAll[emptyTree, treeInsert]; > emptyTree = {}; > > treeInsert[emptyTree, elem_] := {emptyTree, elem, emptyTree} > treeInsert[tree_, elem_] /; SameQ[tree[[2]], elem] := tree > treeInsert[tree_, elem_] /; OrderedQ[{tree[[2]], elem}] := > {First[tree], tree[[2]], treeInsert[Last[tree], elem]} > treeInsert[tree_, elem_] := {treeInsert[First[tree], elem], tree[[2]], \ > Last[tree]} > > Here's an even simpler version, same speed: > > ClearAll[emptyTree, treeInsert]; > emptyTree = {}; > > treeInsert[emptyTree, elem_] := {emptyTree, elem, emptyTree} > treeInsert[tree_, elem_] := Which[ > SameQ[tree[[2]], elem], tree, OrderedQ[{tree[[2]], elem}], > {First@tree, tree[[2]], treeInsert[Last@tree, elem]}, > True, {treeInsert[First@tree, elem], tree[[2]], Last@tree}] I had tried variations on this but they all seemed ever so slightly slower. Might be version an OS dependent. > I see one obvious advantage of your algorithm over Husain's and my > earlier code; though I'm not sure it explains its better behavior. That > is, while my algorithm made pattern matching more burdensome, yours > virtually eliminated it. > > Also, the biggest improvement I've seen comes simply from replacing Null > with anything else in Husain's code. Again, that seems to suggest that > time spent on pattern matching is the problem. All the algorithms do > approximately the same real work on the tree structure. > > I really like how Flatten changes your tree into a sorted list. Very > nice! Goes by the name of "tree sort". Using Flatten is basically a shortcut for walking the tree left-to-right. > Bobby Treat It seems I did not look hard enough at the original post or the follow-ups. In point of fact, regarding speed, there IS no obvious advantage of my code over that first version. The advantage is there, of course, as timing checks indicate. But it is quite far from obvious. The reason is related to the (rare) need for Update. It is pointed out in the manual that "infinite" evaluation is not in fact possible (no surprise thus far). A consequence is that Mathematica requires internal optimization features to keep reevaluation to a minimum. Update may be required in cases where such optimizations are overzealous. What we have in the case of this example is, roughly, the opposite. The code that determines need for reevaluation is simply not working sufficiently well. It does make the correct determination, but only after looking over the entire structure that is being evaluated. As we have a structure growing linearly in the number of iterations, and it gets mostly traversed each iteration, the algorithmic complexity becomes at least O(n^2). Not a disaster unless the appropriate complexity is smaller, which was the case for this example. This is not a bug insofar as it is a (regretably) necessary consequence of Mathematica evaluation semantics. The problems it can cause are all the same quite undesirable. We've made some inroads on this problem by enlarging substantially the class of expressions for which reevaluation may be quickly short-circuited, with the idea of fixing all but the most pathological of examples. The status of that work is unfortunately not known to me at present, though I'll try to find out about it. Daniel Lichtblau Wolfram Research