MathGroup Archive 2009

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Why is recursion so slow in Mathematica?

  • To: mathgroup at smc.vnet.net
  • Subject: [mg100568] Re: Why is recursion so slow in Mathematica?
  • From: Daniel <janzon at gmail.com>
  • Date: Mon, 8 Jun 2009 02:07:03 -0400 (EDT)
  • References: <h0d6s8$spq$1@smc.vnet.net> <h0fvn1$ree$1@smc.vnet.net>

On 7 Juni, 11:02, Szabolcs Horv=E1t <szhor... at gmail.com> wrote:
> Daniel wrote:
> > This post is about functional programming in Mathematica versus other
> > functional languages such as OCaml, SML or Haskell. At least a naive
> > use of functional constructs in Mathematica is horrendously slow. Am I
> > doing something wrong? Or isn't Mathematica really suitable for
> > functional programming beyond toy programs? Couldn't the Wolfram team
> > make a more efficient implementation for recursion, as other
> > functional languages has done? (Instead of the naive C-like behavior
> > for recursively defined functions.)
>
> > As grounds for my question/argument, I wrote my own version of select,
> > as below
>
> > myselect[{}, predicate_] = {}
> > myselect[{head_, tail___}, predicate_] := If[predicate[head],
> >   Join[{head}, myselect[{tail}, predicate]],
> >   myselect[{tail}, predicate]
> >   ]
>
> > Then I tried this function on a 20.000 element vector with machine
> > size floats:
>
> > data = Table[Random[], {20000}];
> > $RecursionLimit = 100000;
> > Timing[data2 = myselect[data, # > 0.5 &];]
>
> > The result is {7.05644, Null}, and hundreds of MB of system memory are
> > allocated. On 1.7 GHZ dual core Intel machine with 1 GB of RAM. For
> > 20.000 floats! It's just a megabyte!
>
> > The following OCaml program executes in apparently no-time. It is not
> > compiled and does the same thing as the above Mathematica code. After
> > increasing the list by a factor of ten to 200.000 elements, it still
> > executes in a blink. (But with 2.000.000 elements my OCaml interpreter
> > says Stack overflow.)
>
> > let rec randlist n = if n=0 then [] else Random.float(1.0) :: randl=
ist
> > (n-1);;
>
> > let rec myselect = function
> >     [],predicate -> []
> >   |  x::xs,predicate -> if predicate(x) then x::myselect(xs,predica=
te)
> > else myselect(xs,predicate);;
>
> > let mypred x = x>0.5;;
>
> > let l=randlist(20000);;
> > let l2=myselect(l,mypred);;  (* lightning-fast compared to Mathemat=
ica
> > *)
>
> So you discovered that appending to arrays is slow while appending to
> linked lists is fast :)
>
> Actually you are not comparing the exact same algorithms.  The data
> structure you used in those languages was a linked list.  Removing or
> appending elements takes a constant time for linked lists.
> Mathematica's List is a vector-type data structure, so
> removing/appending elements takes a time proportional to the vector's
> length.
>
> One data structure is not better than the other, of course, they're just
> useful for different purposes.  For example, a linked list is unsuitabl=
e
> for applications where the elements need to be accessed randomly instead
> of sequentially.  Random access is often needed for numerical
> computations/simulations.
>
> For a fair comparison, use a linked list in Mathematica too.  This coul=
d
> look like {1,{2,{3,{}}}}
>
> myselect[{}, test_] = {};
> myselect[{head_, tail_}, test_] :=
>   If[test[head], {head, myselect[tail, test]}, myselect[tail, test]]
>
> toLinkedList[list_List] := Fold[{#2, #1} &, {}, list]
>
> data = toLinkedList@Table[Random[], {20000}];
>
> Block[{$RecursionLimit = \[Infinity]},
>   Timing[myselect[data, # > 0.5 &];]]
>
> This runs in 0.2 sec on my (not very fast) system.
>
> And you can check that the function works correctly:
>
> Block[{$RecursionLimit = \[Infinity]},
>   Flatten@myselect[data, # > 0.5 &] ===
>    Select[Flatten[data], # > 0.5 &]]
>
> About the thing that you call "naive C-like behavior": you probably mean
> that Mathematica doesn't automatically perform the tail-call
> optimization.  That is correct, you need to transform your recursions
> explicitly.
>
> Here's an example:
>
> myselect2[dest_, {head_, tail_}, test_] :=
>   If[test[head],
>    myselect2[{head, dest}, tail, test],
>    myselect2[dest, tail, test]]
>
> myselect2[dest_, {}, test_] := dest
>
> Note that it is $IterationLimit that we need to increase now and this
> version doesn't fill up the evaluation stack.
>
> Also note that this produces the result in the reverse order:
>
> Block[{$IterationLimit = \[Infinity]},
>   Reverse@Flatten@myselect2[{}, data, # > 0.5 &] ===
>    Select[Flatten[data], # > 0.5 &]]
>
> We could produce the result in the same order, but reverse linking (like
> {{{{},1},2},3}).
>
> I couldn't do any better than this using a singly-linked list.  It look=
s
> like OCaml (which I don't know BTW) uses a singly linked list too, and
> can't do the tail-call optimization, so its stack gets filled up.
>
> One more thing: even when using the same data structure and same
> algorithm, it is of course expected that a very high level interpreted
> language like Mathematica is not going to perform nearly as well as a
> low level compiled language like OCaml.  In high level languages the
> usual solution to this performance problem is built-in functions: don't
> implement Select in Mathematica itself.  Use the built-in one, which is
> written in C and much faster than a Mathematica implementation could ever=
 be.
>
> Hope this helps,
> Szabolcs



Oh, so they are arrays! Does that mean Mathematica has to reallocate
memory to fit in new elements?

Just a check, what's the space requirment for your representations of
linked list?

ByteCount[Fold[{#2,#1}&,{},Table[RandomReal[],{1000}]]

48024 bytes for 1000 eight-byte doubles. Same byte count for
RandomInteger, BTW! In C I guess it would be something like 8*1000
bytes for the data plus 4*1000 bytes for the pointers to the next list
elment. The Mathematica overhead seems reasonably ok to me, and it
seems to be linear with the number of elements.

Tail-recursion was a good tip, thanks a lot! :)  Just for fun I tried
a tail-recursive version of the standard list implementation of
myselect (i.e. using arrays). The time performance is almost the same,
but the tail-recursive versions doesn't eat memory like a monster.

In summary, with your help I feel confident to continue my journey
into Mathematica land :)  I hope one day I will be able to contribute
back to this group and the community.

Regards,
Daniel


  • Prev by Date: Re: Why is recursion so slow in Mathematica?
  • Next by Date: Seeking a Porter Stemmer done in Mathematica
  • Previous by thread: Re: Why is recursion so slow in Mathematica?
  • Next by thread: Problem with GraphicsColumn