MathGroup Archive: June 2009 [00201]

[Date Index] [Thread Index] [Author Index]

Re: Why is recursion so slow in Mathematica?

To: mathgroup at smc.vnet.net
Subject: [mg100545] Re: Why is recursion so slow in Mathematica?
From: Szabolcs Horvát <szhorvat at gmail.com>
Date: Sun, 7 Jun 2009 05:03:11 -0400 (EDT)
References: <h0d6s8$spq$1@smc.vnet.net>

Daniel wrote:
> This post is about functional programming in Mathematica versus other
> functional languages such as OCaml, SML or Haskell. At least a naive
> use of functional constructs in Mathematica is horrendously slow. Am I
> doing something wrong? Or isn't Mathematica really suitable for
> functional programming beyond toy programs? Couldn't the Wolfram team
> make a more efficient implementation for recursion, as other
> functional languages has done? (Instead of the naive C-like behavior
> for recursively defined functions.)
> 
> As grounds for my question/argument, I wrote my own version of select,
> as below
> 
> myselect[{}, predicate_] = {}
> myselect[{head_, tail___}, predicate_] := If[predicate[head],
>   Join[{head}, myselect[{tail}, predicate]],
>   myselect[{tail}, predicate]
>   ]
> 
> Then I tried this function on a 20.000 element vector with machine
> size floats:
> 
> data = Table[Random[], {20000}];
> $RecursionLimit = 100000;
> Timing[data2 = myselect[data, # > 0.5 &];]
> 
> The result is {7.05644, Null}, and hundreds of MB of system memory are
> allocated. On 1.7 GHZ dual core Intel machine with 1 GB of RAM. For
> 20.000 floats! It's just a megabyte!
> 
> The following OCaml program executes in apparently no-time. It is not
> compiled and does the same thing as the above Mathematica code. After
> increasing the list by a factor of ten to 200.000 elements, it still
> executes in a blink. (But with 2.000.000 elements my OCaml interpreter
> says Stack overflow.)
> 
> let rec randlist n = if n=0 then [] else Random.float(1.0) :: randlist
> (n-1);;
> 
> let rec myselect = function
>     [],predicate -> []
>   |  x::xs,predicate -> if predicate(x) then x::myselect(xs,predicate)
> else myselect(xs,predicate);;
> 
> let mypred x = x>0.5;;
> 
> let l=randlist(20000);;
> let l2=myselect(l,mypred);;  (* lightning-fast compared to Mathematica
> *)
> 

So you discovered that appending to arrays is slow while appending to 
linked lists is fast :)

Actually you are not comparing the exact same algorithms.  The data 
structure you used in those languages was a linked list.  Removing or 
appending elements takes a constant time for linked lists. 
Mathematica's List is a vector-type data structure, so 
removing/appending elements takes a time proportional to the vector's 
length.

One data structure is not better than the other, of course, they're just 
useful for different purposes.  For example, a linked list is unsuitable 
for applications where the elements need to be accessed randomly instead 
of sequentially.  Random access is often needed for numerical 
computations/simulations.

For a fair comparison, use a linked list in Mathematica too.  This could 
look like {1,{2,{3,{}}}}

myselect[{}, test_] = {};
myselect[{head_, tail_}, test_] :=
  If[test[head], {head, myselect[tail, test]}, myselect[tail, test]]

toLinkedList[list_List] := Fold[{#2, #1} &, {}, list]

data = toLinkedList@Table[Random[], {20000}];

Block[{$RecursionLimit = \[Infinity]},
  Timing[myselect[data, # > 0.5 &];]]

This runs in 0.2 sec on my (not very fast) system.

And you can check that the function works correctly:

Block[{$RecursionLimit = \[Infinity]},
  Flatten@myselect[data, # > 0.5 &] ===
   Select[Flatten[data], # > 0.5 &]]

About the thing that you call "naive C-like behavior": you probably mean 
that Mathematica doesn't automatically perform the tail-call 
optimization.  That is correct, you need to transform your recursions 
explicitly.

Here's an example:

myselect2[dest_, {head_, tail_}, test_] :=
  If[test[head],
   myselect2[{head, dest}, tail, test],
   myselect2[dest, tail, test]]

myselect2[dest_, {}, test_] := dest

Note that it is $IterationLimit that we need to increase now and this 
version doesn't fill up the evaluation stack.

Also note that this produces the result in the reverse order:

Block[{$IterationLimit = \[Infinity]},
  Reverse@Flatten@myselect2[{}, data, # > 0.5 &] ===
   Select[Flatten[data], # > 0.5 &]]

We could produce the result in the same order, but reverse linking (like 
{{{{},1},2},3}).

I couldn't do any better than this using a singly-linked list.  It looks 
like OCaml (which I don't know BTW) uses a singly linked list too, and 
can't do the tail-call optimization, so its stack gets filled up.

One more thing: even when using the same data structure and same 
algorithm, it is of course expected that a very high level interpreted 
language like Mathematica is not going to perform nearly as well as a 
low level compiled language like OCaml.  In high level languages the 
usual solution to this performance problem is built-in functions: don't 
implement Select in Mathematica itself.  Use the built-in one, which is 
written in C and much faster than a Mathematica implementation could ever be.

Hope this helps,
Szabolcs

Follow-Ups:
- Re: Why is recursion so slow in Mathematica?
  - From: Daniel <janzon@gmail.com>
- Re: Re: Why is recursion so slow in Mathematica?
  - From: DrMajorBob <btreat1@austin.rr.com>

Prev by Date: Re: Possible bug in Eigenvalues[ ] ???

Next by Date: Re: Why is recursion so slow in Mathematica?

Previous by thread: Re: MergeSort with replacerepeated - a follow-up

Next by thread: Re: Re: Why is recursion so slow in Mathematica?