Re: Efficiency of repeatedly appending to a List
- To: mathgroup at smc.vnet.net
- Subject: [mg71170] Re: Efficiency of repeatedly appending to a List
- From: "Philipp" <Philipp.M.O at gmail.com>
- Date: Thu, 9 Nov 2006 03:40:36 -0500 (EST)
- References: <eisg8h$n5a$1@smc.vnet.net>
Andrew,
The following addresses your original question & some issues from this
question/comment.
The most efficient/fastest way of keeping/extending a database of
sorted values I have discovered is by using Mathematica Indexed Object.
Let's assume that the data is initially stored in a list of a form
{{key1, data1}, ..., {keyM, dataM}}
for clarity let's further assume that keyK and dataK are String type.
For example,
In[1]:= data = {{"key01", "data01"}, {"key02", "data02"},
{"key03", "data03"}, {"key04", "data04"},
{"key05", "data05"}, {"key06", "data06"},
{"key07", "data07"}, {"key08", "data08"},
{"key09", "data09"}, {"key10", "data10"},
{"key11", "data11"}, {"key12", "data12"},
{"key13", "data13"}, {"key14", "data14"}};
The following statement will assign the data to the Indexed Object
called dataArray,
In[2]:= Clear[dataArray];
In[3]:= Scan[(dataArray[First[#]] = Append[dataArray[First[#]] /.
{dataArray[_] -> {}}, #]) &, data]
or, in more elegant¹ way,
Scan[((#1[#2[1]] = Append[#1[#2[1]] /. {#1[_] -> {}}, #2])
&)[dataArray, #1] &, data];
The advantages of this solution are as follows:
1. the key could be "anything" (any syntactically correct expression)
2. several "records" can be associated with a single "key", e.g.,
In[4]:= addData[dataArray_, r : {k_String, d_String}] :=
(dataArray[First[#]] = Append[dataArray[First[#]] /.
{dataArray[_] -> {}}, #]) &[{k, d}]
In[5]:= addData[dataArray, {"key10", "data10B"}]
Out[5]= {{key10, data10}, {key10, data10B}}
In[6]:= dataArray["key10"]
Out[6]= {{key10, data10}, {key10, data10B}}
3. The addition of a new "key" (In[4], In[5]) or retrieval of "data"
with a key (In[6]) is of O(1) time complexity, (more precisely, almost
constant in time, say O(>1), since Mathematica most probably uses a
hashing algorithm and there would be some collisions).
4. deleting a record is also O(1)
In[7]:= deleteKey[dataArray_, key_String] := Unset[dataArray[key]]
In[8]:= deleteKey[dataArray, "key03"]
Extracting the indices is O(n) and can be performed by the following
statement,
In[9]:= idx = ReleaseHold[Apply[(#1 &), DownValues[#], {1}] /.
#[x_] :> x] &[dataArray]
Out[9]= {key01, key02, key04, key05, key06, key07, key08, key09,
key10, key11, key12, key13, key14}
Converting values to a list is also O(n),
In[10]:= vlist = ReleaseHold[Apply[(#2 &), DownValues[#], {1}] /.
#[x_] :> x] &[dataArray]
Out[10]= {{{key01, data01}}, {{key02, data02}},
{{key04, data04}}, {{key05, data05}},
{{key06, data06}}, {{key07, data07}},
{{key08, data08}}, {{key09, data09}},
{{key10, data10}, {key10, data10B}},
{{key11, data11}}, {{key12, data12}},
{{key13, data13}}, {{key14, data14}}}
And, of course, operations such as
nextKey[dataArray_, key_String]
previousKey[dataArray_, key_String]
could be implemented also as O(n) by performing binary search on
indices (O(n) + O(Log[n]) ~= O(n)).
________________________
¹obscure, yet cryptic (but hey, we can do it, we've got the
technology)
=====================================================================
Some numbers indicating O(n) time complexity of Assignment (In[3]) and
Index Extraction (In[9]) operations.
data comprises a list of "records" in the form
{"IRV", {1991, 7, 24}, 41.97, 51.25, 40.45, 44.98, 696000}
i.e., each record is a randomly generated stock-market daily
price-volume
data.
The following plot shows normalised execution times of #=Assignment
(In[3]) and &=Index Extraction (In[9]) as functions of a number of
records in the data list. The number of records ranges from 20,000 to
400,000 in 20,000 steps.
#
30##
#
# &&
# & Index extraction & &&
25## In[9] &
# &&
# &&
## &
20# & ##
# & & #
# & & ##
## & # ##
15# && ## #
# & #
# && ##
10## && # ##
# & ## # # Assignment In[3]
# && ##
# && && #
5## & && ##
# & ##
# && &&
#####&&################################################################
# # # #
100000 200000 300000
It appears from the above picture that both operations are "strongly"
linear indicating O(n) time complexity.
Cheers,
Philipp.
Appendix
========
Code to obtain the results and the plot.
RandomString[chars_List, len_Integer] :=
StringJoin[chars[[Table[Random[Integer, {1, Length[chars]}], {len}]]]]
Random3Date[] :=
Take[ToDate[Random[Integer, {1, IntegerPart[FromDate[Date[]]]}]], 3]
n = 400000;
crangeAZ = CharacterRange["A", "Z"];
data = Table[{RandomString[crangeAZ, Random[Integer, {1, 150}]],
Random3Date[], Apply[Sequence, Table[Random[Real, {0., 100.}], {4}]],
Random[Integer, {1, 10^6}]}, {n}];
Length[data]
x = Table[Clear[dataArray];
{#, First[Timing[Scan[(dataArray[First[#]] =
Append[dataArray[First[#]] /. {dataArray[_] -> {}}, #]) &,
Take[data, #]]]],
First[Timing[t = ReleaseHold[Apply[(#1 &), DownValues[#],
{1}] /. #[x_] :> x]] &[dataArray]], Length[t]} &[i], {i, 20000, n,
20000}] /. Second -> 1
{y, z} = Transpose[Apply[{{#1, #2/x[[1, 2]]}, {#1, #3/x[[1, 2]]}} &, x,
{1}]]
Needs["Graphics`MultipleListPlot`"];
MultipleListPlot[y, z]
Phil
> Hi all,
>
> Recently I asked about efficient ways to insert items into a sorted
> List such that the new list was also sorted [1]. I received suggestions
> to linearly search through the list and find the right place to insert
> the new time (search time = O(N), where N is the size of the list), and
> also suggestions to use a faster binary search to locate the right
> place to insert the new time (search time = O(Log[N])).
>
> This caused me to wonder about the efficiency of appending items to
> Lists in Mathematica in general. Is it an O(N) operation (i.e.,
> something akin to a realloc in C)? Or does Mathematica employ some sort
> of anticipatory memory allocation scheme to make adding new elements to
> lists usually faster than O(N)?
>
> Cheers,
> Andrew
>
> [1]
> http://groups-beta.google.com/group/comp.soft-sys.math.mathematica/browse_thread/thread/0c329c83dec5a1b1