MathGroup Archive: February 2001 [00092]

[Date Index] [Thread Index] [Author Index]
AW: Appending to Lists
To: mathgroup at smc.vnet.net
Subject: [mg27112] AW: [mg27045] Appending to Lists
From: "Wolf, Hartmut" <Hartmut.Wolf at t-systems.de>
Date: Sun, 4 Feb 2001 02:58:40 -0500 (EST)
Sender: owner-wri-mathgroup at wolfram.com

-----Ursprüngliche Nachricht-----
Von: James Jones [mailto:j.k.jones at dl.ac.uk]
Gesendet: Donnerstag, 1. Februar 2001 09:01
An: mathgroup at smc.vnet.net
Betreff: [mg27045] Appending to Lists


Hi,

I have a function that creates a list (a) from another list (b). The list
elements are re-grouped in the new list according to a third list (c). A
Position command is applied to list (c) for an element, then with this
output the list (a) is created from list (b) at positions given by the
element position data, list (c). This is repeated for the large number of
elements in the original lists.
The Position command is necessary as different elements appear in the list a
different number of times.
However, with the large number of elements in the lists (approx 50,000 for a
simple list), this method is _very_ slow.
If any one can give me help in speeding this process up I would be very
grateful.

The data sets would look like this

      b                       c

    0.2                      1
    0.6                      2
    1.2                      3
    -0.2                     1
    0.5                       2
    0.3                       1
    0.7                       2
   -0.2                      1
   -0.6                      1

A List would then be created from this data ( the list (a) ) containing
vectors for 1, 2 and 3. The data in (b) is not important, and the order in
which elements in (c) drop out is not set.
In this case the (a) list should look like

a = { { 0.2, -0.2, -0.2, -0.6} , {0.6, 0.5, 0.7} , { 1.2 } }

My current function looks like this

Do[AppendTo[xfinal,
      Flatten[Part[X, #] & /@
          Position[Global`PARTICLE, i]]], {i, 1,
      Max[PARTICLE]}];

where xfinal is an (a) list, i.e. to be created.
          X is the (b) list , i.e. to be addressed, and
          PARTICLE is the (c) list. It is referenced by number.

and it is very slow!

Also, after producing this list, the different vector elements need to be
made the same length, and so 0.0 are added to the ends of all vector
elements shorter than the longest. My current function for doing this looks
like

table = Table[0.0, {Length[First[long]]}]; Print["Table Created!"];

Do[If[Length[Part[xfinal, i]] < Length[First[long]],
      AppendTo[Part[xfinal, i],
        Drop[table, (Length[Part[xfinal, i]])] ]], {i, 2,
      Length[xfinal]}];

where list (long) just sorts the list elements according to length.

This function is also very slow, and I was wondering, again, if anyone knew
a faster way of implementing this. Is the production of a table, once, and
then dropping bits off and appending the fastest method? Of course this
needs to be done tens of thousands of times per set of data so any small
speed increase would be very helpful ;->

Again, any help much appreciated,

James Jones
Daresbury Laboratory




Dear Jones,

here a proposal, how to do things eficient in input size:

b = {0.2, 0.6, 1.2, -0.2, 0.5, 0.3, 0.7, -0.2, -0.6};
c = {1, 2, 3, 1, 2, 1, 2, 1, 1};
 
Transpose[{c, b}]
Out[]=
{{1, 0.2}, {2, 0.6}, {3, 1.2}, {1, -0.2}, {2, 0.5}, {1, 0.3}, {2, 
    0.7}, {1, -0.2}, {1, -0.6}}

your measured values are now tagged with the particle (or is it a counter?)
Sorting these, means grouping of measurements to p.or c.

rawl = Sort@Transpose[{c, b}]
Out[]=
{{1, -0.6}, {1, -0.2}, {1, -0.2}, {1, 0.2}, {1, 0.3}, {2, 0.5}, {2, 0.6},
 {2, 0.7}, {3, 1.2}}

split makes this explicit in structure

Split[Sort@Transpose[{c, b}], First[#1] === First[#2] &]
Out[]=
{{{1, -0.6}, {1, -0.2}, {1, -0.2}, {1, 0.2}, {1, 0.3}}, 
 {{2, 0.5}, {2, 0.6}, {2, 0.7}}, {{3, 1.2}}}

Now we dispose off the tags

l = Map[Last, Split[Sort@Transpose[{c, b}], First[#1] === First[#2] &], {2}]
Out[]=
{{-0.6, -0.2, -0.2, 0.2, 0.3}, {0.5, 0.6, 0.7}, {1.2}}

We get the maximum length:

maxlen = Max @@ Length /@ l
Out[]=5

This makes all records to equal length (however I wonder why?)

PadRight[#, maxlen, 0.] & /@ l
Out[]=
{{0.2, -0.2, 0.3, -0.2, -0.6}, {0.6, 0.5, 0.7, 0., 0.}, {1.2, 0., 0., 0.,
0.}}
 
You obviously lost information through this procedure.
Alternatively you may group your events to p.or c. e.g.like this

With[{t = Table[0. , {3}]}, ReplacePart[t, #2, #1] & @@@ rawl]// 
    Transpose
Out[]=
{{-0.6, -0.2, -0.2, 0.2, 0.3, 0., 0., 0., 0.}, 
 {0., 0., 0., 0., 0., 0.5, 0.6, 0.7, 0.}, 
 {0., 0., 0., 0., 0., 0., 0., 0., 1.2}}

You might like to check whether this expression is more adequate for further
processing (case of the *un*paded result isn't)

Test for performance 
 
 3 Particles or Counters, 100,000 events:

In[6]:= nWhat = 3; nEvents = 100000;
In[7]:= btest = Table[Random[Real, {-1, 1}], {nEvents}];
In[8]:= ctest = Table[Random[Integer, {1, nWhat}], {nEvents}];
In[9]:=
Timing[With[{ltest = 
          Flatten[Part[btest, #] & /@ Position[ctest, #]] & /@
Range[nWhat]}, 
      With[{maxlen = Max @@ Length /@ ltest}, 
        PadRight[#, maxlen, 0.] & /@ ltest]]][[1]]
Out[9]= 1.392 Second
In[10]:=
Timing[With[{t0 = Table[0., {nWhat}]}, 
        ReplacePart[t0, #2, #1] & @@@ Transpose[{ctest, btest}]] // 
      Transpose][[1]]
Out[10]= 2.253 Second

Test 30 Particles or Counters, 100,000 events
In[20]:=
nWhat = 30; nEvents = 100000;
...
In[23]:=
Timing[With[{ltest = 
          Flatten[Part[btest, #] & /@ Position[ctest, #]] & /@
Range[nWhat]}, 
      With[{maxlen = Max @@ Length /@ ltest}, 
        PadRight[#, maxlen, 0.] & /@ ltest]]][[1]]
Out[23]=
3.465 Second
In[24]:=
Timing[With[{t0 = Table[0., {nWhat}]}, 
        ReplacePart[t0, #2, #1] & @@@ Transpose[{ctest, btest}]] // 
      Transpose][[1]]
Out[24]=
4.236 Second

Test 300 Particles or Counters, 30,000 events
The alternative eats up more memory, so I reduced the # of events here (for
my notebook computer. Yet both algorithms are clearly linear in # of events)
In[25]:=
nWhat = 300; nEvents = 30000;
...
In[28]:=
Timing[With[{ltest = 
          Flatten[Part[btest, #] & /@ Position[ctest, #]] & /@
Range[nWhat]}, 
      With[{maxlen = Max @@ Length /@ ltest}, 
        PadRight[#, maxlen, 0.] & /@ ltest]]][[1]]
Out[28]=
9.564 Second
In[29]:=
Timing[With[{t0 = Table[0., {nWhat}]}, 
        ReplacePart[t0, #2, #1] & @@@ Transpose[{ctest, btest}]] // 
      Transpose][[1]]
Out[29]=
4.857 Second

so you see the alternative behaves better for larger numbers of particles
(or was it counters?)

Test 1000 Particles or Counters, 10,000 events
In[30]:=
nWhat = 1000; nEvents = 10000;
...
In[33]:=
Timing[With[{ltest = 
          Flatten[Part[btest, #] & /@ Position[ctest, #]] & /@
Range[nWhat]}, 
      With[{maxlen = Max @@ Length /@ ltest}, 
        PadRight[#, maxlen, 0.] & /@ ltest]]][[1]]
Out[33]=
16.193 Second
In[34]:=
Timing[With[{t0 = Table[0., {nWhat}]}, 
        ReplacePart[t0, #2, #1] & @@@ Transpose[{ctest, btest}]] // 
      Transpose][[1]]
Out[34]=
5.238 Second

So, if you have enough real(!) memory, the alternative is clearly superior,
and still contains all information

-- Hartmut Wolf
Prev by Date: Re: Appending to Lists
Next by Date: RE: Appending to Lists
Previous by thread: FW: AW: LogPlot/Plot-Identity
Next by thread: MATHEMATICA PROGRAM