Re: Efficiency of repeatedly appending to a List

• To: mathgroup at smc.vnet.net
• Subject: [mg71170] Re: Efficiency of repeatedly appending to a List
• From: "Philipp" <Philipp.M.O at gmail.com>
• Date: Thu, 9 Nov 2006 03:40:36 -0500 (EST)
• References: <eisg8h\$n5a\$1@smc.vnet.net>

```Andrew,

question/comment.

The most efficient/fastest way of keeping/extending a database of
sorted values I have discovered is by using Mathematica Indexed Object.

Let's assume that the data is initially stored in a list of a form

{{key1, data1}, ..., {keyM, dataM}}

for clarity let's further assume that keyK and dataK are String type.
For example,

In[1]:=  data = {{"key01", "data01"}, {"key02", "data02"},
{"key03", "data03"}, {"key04", "data04"},
{"key05", "data05"}, {"key06", "data06"},
{"key07", "data07"}, {"key08", "data08"},
{"key09", "data09"}, {"key10", "data10"},
{"key11", "data11"}, {"key12", "data12"},
{"key13", "data13"}, {"key14", "data14"}};

The following statement will assign the data to the Indexed Object
called dataArray,

In[2]:=  Clear[dataArray];
In[3]:=  Scan[(dataArray[First[#]] = Append[dataArray[First[#]] /.
{dataArray[_] -> {}}, #]) &, data]

or, in more elegant¹ way,

Scan[((#1[#2[1]] = Append[#1[#2[1]] /. {#1[_] -> {}}, #2])
&)[dataArray, #1] &, data];

The advantages of this solution are as follows:
1. the key could be "anything" (any syntactically correct expression)
2. several "records" can be associated with a single "key", e.g.,

In[4]:=  addData[dataArray_, r : {k_String, d_String}] :=
(dataArray[First[#]] = Append[dataArray[First[#]] /.
{dataArray[_] -> {}}, #]) &[{k, d}]

Out[5]=  {{key10, data10}, {key10, data10B}}

In[6]:=  dataArray["key10"]
Out[6]=  {{key10, data10}, {key10, data10B}}

3. The addition of a new "key" (In[4], In[5]) or retrieval of "data"
with a key (In[6]) is of O(1) time complexity, (more precisely, almost
constant in time, say O(>1), since Mathematica most probably uses a
hashing algorithm and there would be some collisions).

4. deleting a record is also O(1)

In[7]:=  deleteKey[dataArray_, key_String] := Unset[dataArray[key]]
In[8]:=  deleteKey[dataArray, "key03"]

Extracting the indices is O(n) and can be performed by the following
statement,

In[9]:=  idx = ReleaseHold[Apply[(#1 &), DownValues[#], {1}] /.
#[x_] :> x] &[dataArray]
Out[9]=  {key01, key02, key04, key05, key06, key07, key08, key09,
key10, key11, key12, key13, key14}

Converting values to a list is also O(n),

In[10]:=  vlist = ReleaseHold[Apply[(#2 &), DownValues[#], {1}] /.
#[x_] :> x] &[dataArray]
Out[10]=  {{{key01, data01}}, {{key02, data02}},
{{key04, data04}}, {{key05, data05}},
{{key06, data06}}, {{key07, data07}},
{{key08, data08}}, {{key09, data09}},
{{key10, data10}, {key10, data10B}},
{{key11, data11}}, {{key12, data12}},
{{key13, data13}}, {{key14, data14}}}

And, of course, operations such as

nextKey[dataArray_, key_String]
previousKey[dataArray_, key_String]

could be implemented also as O(n) by performing binary search on
indices (O(n) + O(Log[n]) ~= O(n)).

________________________
¹obscure, yet cryptic (but hey, we can do it, we've got the
technology)

=====================================================================

Some numbers indicating O(n) time complexity of Assignment (In[3]) and
Index Extraction (In[9]) operations.

data comprises a list of "records" in the form

{"IRV", {1991, 7, 24}, 41.97, 51.25, 40.45, 44.98, 696000}

i.e., each record is a randomly generated stock-market daily
price-volume
data.

The following plot shows normalised execution times of #=Assignment
(In[3]) and &=Index Extraction (In[9]) as functions of a number of
records in the data list. The number of records ranges from 20,000 to
400,000 in 20,000 steps.

#
30##
#
#                                                                &&
#                                       & Index extraction    &  &&
25##                                            In[9]           &
#                                                         &&
#                                                     &&
##                                                 &
20#                                               &                ##
#                                           &  &              #
#                                        &  &             ##
##                                      &          #  ##
15#                                   &&         ##  #
#                                &          #
#                            &&         ##
10##                       &&      #  ##
#                     &      ##  #        # Assignment In[3]
#                 &&     ##
#             &&  &&  #
5##         &  &&  ##
#          &  ##
#  &&  &&
#####&&################################################################
#                 #                  #                 #
100000             200000            300000

It appears from the above picture that both operations are "strongly"
linear indicating O(n) time complexity.

Cheers,
Philipp.

Appendix
========

Code to obtain the results and the plot.

RandomString[chars_List, len_Integer] :=
StringJoin[chars[[Table[Random[Integer, {1, Length[chars]}], {len}]]]]
Random3Date[] :=
Take[ToDate[Random[Integer, {1, IntegerPart[FromDate[Date[]]]}]], 3]

n = 400000;

crangeAZ = CharacterRange["A", "Z"];
data = Table[{RandomString[crangeAZ, Random[Integer, {1, 150}]],
Random3Date[], Apply[Sequence, Table[Random[Real, {0., 100.}], {4}]],
Random[Integer, {1, 10^6}]}, {n}];

Length[data]

x = Table[Clear[dataArray];
{#, First[Timing[Scan[(dataArray[First[#]] =
Append[dataArray[First[#]] /. {dataArray[_] -> {}}, #]) &,
Take[data, #]]]],
First[Timing[t = ReleaseHold[Apply[(#1 &), DownValues[#],
{1}] /. #[x_] :> x]] &[dataArray]], Length[t]} &[i], {i, 20000, n,
20000}] /. Second -> 1

{y, z} = Transpose[Apply[{{#1, #2/x[[1, 2]]}, {#1, #3/x[[1, 2]]}} &, x,
{1}]]

Needs["Graphics`MultipleListPlot`"];
MultipleListPlot[y, z]

Phil

> Hi all,
>
> Recently I asked about efficient ways to insert items into a sorted
> List such that the new list was also sorted [1]. I received suggestions
> to linearly search through the list and find the right place to insert
> the new time (search time = O(N), where N is the size of the list), and
> also suggestions to use a faster binary search to locate the right
> place to insert the new time (search time = O(Log[N])).
>
> This caused me to wonder about the efficiency of appending items to
> Lists in Mathematica in general. Is it an O(N) operation (i.e.,
> something akin to a realloc in C)? Or does Mathematica employ some sort
> of anticipatory memory allocation scheme to make adding new elements to
> lists usually faster than O(N)?
>
> Cheers,
> Andrew
>
> [1]