MathGroup Archive: November 2006 [00349]

[Date Index] [Thread Index] [Author Index]
Re: Efficiency of repeatedly appending to a List
To: mathgroup at smc.vnet.net
Subject: [mg71170] Re: Efficiency of repeatedly appending to a List
From: "Philipp" <Philipp.M.O at gmail.com>
Date: Thu, 9 Nov 2006 03:40:36 -0500 (EST)
References: <eisg8h$n5a$1@smc.vnet.net>
Andrew,

The following addresses your original question & some issues from this
question/comment.

The most efficient/fastest way of keeping/extending a database of
sorted values I have discovered is by using Mathematica Indexed Object.

Let's assume that the data is initially stored in a list of a form

{{key1, data1}, ..., {keyM, dataM}}

for clarity let's further assume that keyK and dataK are String type.
For example,

In[1]:=  data = {{"key01", "data01"}, {"key02", "data02"},
                 {"key03", "data03"}, {"key04", "data04"},
                 {"key05", "data05"}, {"key06", "data06"},
                 {"key07", "data07"}, {"key08", "data08"},
                 {"key09", "data09"}, {"key10", "data10"},
                 {"key11", "data11"}, {"key12", "data12"},
                 {"key13", "data13"}, {"key14", "data14"}};

The following statement will assign the data to the Indexed Object
called dataArray,

In[2]:=  Clear[dataArray];
In[3]:=  Scan[(dataArray[First[#]] = Append[dataArray[First[#]] /.
           {dataArray[_] -> {}}, #]) &, data]

or, in more elegant¹ way,

         Scan[((#1[#2[1]] = Append[#1[#2[1]] /. {#1[_] -> {}}, #2])
           &)[dataArray, #1] &, data];

The advantages of this solution are as follows:
1. the key could be "anything" (any syntactically correct expression)
2. several "records" can be associated with a single "key", e.g.,

In[4]:=  addData[dataArray_, r : {k_String, d_String}] :=
          (dataArray[First[#]] = Append[dataArray[First[#]] /.
            {dataArray[_] -> {}}, #]) &[{k, d}]

In[5]:=  addData[dataArray, {"key10", "data10B"}]
Out[5]=  {{key10, data10}, {key10, data10B}}

In[6]:=  dataArray["key10"]
Out[6]=  {{key10, data10}, {key10, data10B}}

3. The addition of a new "key" (In[4], In[5]) or retrieval of "data"
with a key (In[6]) is of O(1) time complexity, (more precisely, almost
constant in time, say O(>1), since Mathematica most probably uses a
hashing algorithm and there would be some collisions).

4. deleting a record is also O(1)

In[7]:=  deleteKey[dataArray_, key_String] := Unset[dataArray[key]]
In[8]:=  deleteKey[dataArray, "key03"]

Extracting the indices is O(n) and can be performed by the following
statement,

In[9]:=  idx = ReleaseHold[Apply[(#1 &), DownValues[#], {1}] /.
           #[x_] :> x] &[dataArray]
Out[9]=  {key01, key02, key04, key05, key06, key07, key08, key09,
           key10, key11, key12, key13, key14}

Converting values to a list is also O(n),

In[10]:=  vlist = ReleaseHold[Apply[(#2 &), DownValues[#], {1}] /.
            #[x_] :> x] &[dataArray]
Out[10]=  {{{key01, data01}}, {{key02, data02}},
           {{key04, data04}}, {{key05, data05}},
           {{key06, data06}}, {{key07, data07}},
           {{key08, data08}}, {{key09, data09}},
           {{key10, data10}, {key10, data10B}},
           {{key11, data11}}, {{key12, data12}},
           {{key13, data13}}, {{key14, data14}}}

And, of course, operations such as

nextKey[dataArray_, key_String]
previousKey[dataArray_, key_String]

could be implemented also as O(n) by performing binary search on
indices (O(n) + O(Log[n]) ~= O(n)).

________________________
¹obscure, yet cryptic (but hey, we can do it, we've got the
technology)

=====================================================================

Some numbers indicating O(n) time complexity of Assignment (In[3]) and
Index Extraction (In[9]) operations.

data comprises a list of "records" in the form

{"IRV", {1991, 7, 24}, 41.97, 51.25, 40.45, 44.98, 696000}

i.e., each record is a randomly generated stock-market daily
price-volume
data.

The following plot shows normalised execution times of #=Assignment
(In[3]) and &=Index Extraction (In[9]) as functions of a number of
records in the data list. The number of records ranges from 20,000 to
400,000 in 20,000 steps.


  #
30##
  #
  #                                                                &&
  #                                       & Index extraction    &  &&
25##                                            In[9]           &
  #                                                         &&
  #                                                     &&
  ##                                                 &
20#                                               &                ##
  #                                           &  &              #
  #                                        &  &             ##
  ##                                      &          #  ##
15#                                   &&         ##  #
  #                                &          #
  #                            &&         ##
10##                       &&      #  ##
  #                     &      ##  #        # Assignment In[3]
  #                 &&     ##
  #             &&  &&  #
 5##         &  &&  ##
  #          &  ##
  #  &&  &&
#####&&################################################################
  #                 #                  #                 #
                  100000             200000            300000

It appears from the above picture that both operations are "strongly"
linear indicating O(n) time complexity.

Cheers,
Philipp.


Appendix
========

Code to obtain the results and the plot.


RandomString[chars_List, len_Integer] :=
 StringJoin[chars[[Table[Random[Integer, {1, Length[chars]}], {len}]]]]
Random3Date[] :=
 Take[ToDate[Random[Integer, {1, IntegerPart[FromDate[Date[]]]}]], 3]

n = 400000;

crangeAZ = CharacterRange["A", "Z"];
data = Table[{RandomString[crangeAZ, Random[Integer, {1, 150}]],
  Random3Date[], Apply[Sequence, Table[Random[Real, {0., 100.}], {4}]],
    Random[Integer, {1, 10^6}]}, {n}];

Length[data]

x = Table[Clear[dataArray];
      {#, First[Timing[Scan[(dataArray[First[#]] =
    Append[dataArray[First[#]] /. {dataArray[_] -> {}}, #]) &,
             Take[data, #]]]],
             First[Timing[t = ReleaseHold[Apply[(#1 &), DownValues[#],
{1}] /. #[x_] :> x]] &[dataArray]], Length[t]} &[i], {i, 20000, n,
20000}] /. Second -> 1

{y, z} = Transpose[Apply[{{#1, #2/x[[1, 2]]}, {#1, #3/x[[1, 2]]}} &, x,
{1}]]

Needs["Graphics`MultipleListPlot`"];
MultipleListPlot[y, z]

Phil


> Hi all,
>
> Recently I asked about efficient ways to insert items into a sorted
> List such that the new list was also sorted [1]. I received suggestions
> to linearly search through the list and find the right place to insert
> the new time (search time = O(N), where N is the size of the list), and
> also suggestions to use a faster binary search to locate the right
> place to insert the new time (search time = O(Log[N])).
>
> This caused me to wonder about the efficiency of appending items to
> Lists in Mathematica in general. Is it an O(N) operation (i.e.,
> something akin to a realloc in C)? Or does Mathematica employ some sort
> of anticipatory memory allocation scheme to make adding new elements to
> lists usually faster than O(N)?
>
> Cheers,
> Andrew
>
> [1]
> http://groups-beta.google.com/group/comp.soft-sys.math.mathematica/browse_thread/thread/0c329c83dec5a1b1
Prev by Date: Re: Factor.....
Next by Date: RE: Simple Problem with Polyhedron::Hexahedron
Previous by thread: Re: Efficiency of repeatedly appending to a List
Next by thread: Re: Efficiency of repeatedly appending to a List