MathGroup Archive: November 2006 [00343]

[Date Index] [Thread Index] [Author Index]

Re: Efficiency of repeatedly appending to a List

To: mathgroup at smc.vnet.net
Subject: [mg71148] Re: [mg71091] Efficiency of repeatedly appending to a List
From: Sseziwa Mukasa <mukasa at jeol.com>
Date: Thu, 9 Nov 2006 03:38:31 -0500 (EST)
References: <200611081114.GAA22363@smc.vnet.net>

On Nov 8, 2006, at 6:14 AM, Andrew Moylan wrote:

> Hi all,
>
> Recently I asked about efficient ways to insert items into a sorted
> List such that the new list was also sorted [1]. I received  
> suggestions
> to linearly search through the list and find the right place to insert
> the new time (search time = O(N), where N is the size of the list),  
> and
> also suggestions to use a faster binary search to locate the right
> place to insert the new time (search time = O(Log[N])).

I'm one of those who naively suggested an O(N) search, and I should  
know better.  Also I somehow had never heard of the Insert command,  
the two together are probably more efficient than Appending then  
sorting since the sort should be O(N Log[N]).  I had also never  
bothered to check whether a binary search would be in the  
Combinatorica package, which it is.  So here's the code for a binary  
search and insertion:

<<DiscreteMath`Combinatorica
insertSorted[lst_,elem_]:=Insert[lst,elem,Ceiling[BinarySearch 
[lst,elem]]]

> This caused me to wonder about the efficiency of appending items to
> Lists in Mathematica in general. Is it an O(N) operation (i.e.,
> something akin to a realloc in C)? Or does Mathematica employ some  
> sort
> of anticipatory memory allocation scheme to make adding new  
> elements to
> lists usually faster than O(N)?

One way to test this is to Append to lists of various sizes and look  
for discontinuities and a general trend, because of restrictions on  
the mailing list I can't append a plot, but running this code plots  
the timing of Append to lists of various sizes.

Block[{data,lst,x},data=Table[{len,lst=Table[x,{len}];First[Timing 
[Append[lst,x]]/.Second->1]},{len,1000,30000}];ListPlot 
[data,PlotRange->All,PlotJoined->True,AxesLabel->{"Length","Time  
(s)"}]];

There is definitely an O(N) trend to the data, interestingly there is  
also an increasing variance in the data as the list size increases.   
There also seems to be a discontinuity right around a list length of  
2000 when plotted on my machine (dual 2.7 GHz G5, Mac OS X).  The  
discontinuity seems to indicate that at least for short lists extra  
memory is allocated, but in general appending to a list is an O(N)  
operation.  Changing the type of x moves the discontinuity, setting  
x=1 moves the discontinuity to around 4000 elements and results in a  
much lower slope.  I suppose if one is really interested in this kind  
of investigation there is room for futher research but given the  
times involved (less than 0.5ms) I'm not sure how useful this  
knowledge would be.  Mathematica is not well suited to large scale  
data manipulation of this type in my experience and one should  
probably chose a different tool if the majority of operations are  
manipulation of large data structures in memory.

Regards,

Ssezi

Follow-Ups:
- Re: Re: Efficiency of repeatedly appending to a List
  - From: "Chris Chiasson" <chris@chiasson.name>

References:
- Efficiency of repeatedly appending to a List
  - From: "Andrew Moylan" <andrew.j.moylan@gmail.com>

Prev by Date: Re: Re: Evaluating integral with varying upper limit?

Next by Date: Re: comparing implicit 0 with machine floats

Previous by thread: Re: Efficiency of repeatedly appending to a List

Next by thread: Re: Re: Efficiency of repeatedly appending to a List