MathGroup Archive: April 2000 [00144]

[Date Index] [Thread Index] [Author Index]
Re: Fast List Manipulation & more
To: mathgroup at smc.vnet.net
Subject: [mg22981] Re: [mg22965] Fast List Manipulation & more
From: Hartmut Wolf <hwolf at debis.com>
Date: Sat, 8 Apr 2000 14:44:48 -0400 (EDT)
Organization: debis Systemhaus
References: <200004070654.CAA16531@smc.vnet.net>
Sender: owner-wri-mathgroup at wolfram.com
Mark Van De Vyver schrieb:
> 
> I have been looking over previous posts re: fast list extraction,
> manipulation, etc., and it seems that to get the 'fastest' routine depends
> on the specific nature of the problem.  I have not been able to recast my
> problem into one of the existing mg examples, partly I suspect because I
> cannot fully trandlate what is at time fairly cryptic code so I am posting
> this request even though there are some excellent pieces of example code out
> there (e.g. [mg12872] and the excerpts below)  I've included (what seems to
> be the fastest) snippets from previous posts that I thought might be
> helpful.
> 
> The problem is essentially this (using Mathematica 4.0 on NT 4.0 server):
> 
> With list l={{i,x(i),y(i),..}}, and list il={i} extract the Max and Min of
> all the x's, y's,... for each i.
> In another setting we may also use an InterpolatingFunction, and in another
> we want just the maximums of the x's and the maximums of just the y's
> etc.but these can wait, unless there aresimple, fast solutions?
> 
> il is built from l, but the i's are not neccessarily contiguous, both lists
> can be small, growing large Length[il]=1 to 2000, Length[l]=1 to 100,000.
> The relationship between the length of il and l is relatively
> straightforward, if il is length n then l is length k(n)+n, where k is a
> linear function of n, typically 3-50.
> 
> Here i, x, y,.. are Real i, can be large (+/-) and x, y typically in the
> range of -1 to1, unlikely to ever be 9+ (I've noticed in some examples that
> this can make a difference!).   In a seperate setting the x,y can be large,
> in the order of 10^2
> One other criterion is memory :)
> If a large l is fed in a small result list, say r, may be returned
> Length[il]===Length[r].
> Since this is going to be looping v.v.v. frequently it's important that Mathematica
> free memory, as well as be fast. (To be honest I've always stuggled to get
> Mathematica to give back what it takes in memory, even following the usual
> suggestions.)
> 
> What I'd like to know is if anyone can come up with something fatser than
> what I have below, and what do people think of some alternatives ideas
> (hacked from other MG posts) that I've not been able to get to work :(
> 
> Some ideas I had were to alter ss3 in [mg20688] to take the Max and Min
> rather then look at an interval at a time, though I'm not sure how to
> efficiently handle looking over x ,y, ...
> 
> (*  a slightly altered ss3 from [mg20688]  *)
> ss3[events_List, gaps_Interval] :=
>   Module[{t = Length[events],
>       se = Sort[events],
>       bag = {}},
>     Do[While[t > 0 && se[[t]] > gaps[[i, 2]], --t];
>       While[t > 0 && se[[t]] >= gaps[[i, 1]], bag = {bag, t}; --t], {i,
>         Length[gaps], 1, -1}]; Delete[se, List /@ Flatten[bag]]]
> 
>  It may also be possible to alter [mg12872], but now rather than
> /matchingselecting a single value from the larger list, we need to build a
> list of matching values...
> Another idea was to hack [mg21058] to build a list for each i and then look
> over that for the maximum and minmum, again though I was having trouble
> seeing how to modify this to tract the whole sequnces sublists {i, x, y,...}
> for each i by modifying:
> 
> (#2- #1) & @@@
>   Cases[DeleteCases[Partition[t, 2, 1], Alternatives @@ nt, {2}], {_, _}]
> 
> (*  some data  *)
> n = 500;
> tmp = Sort[ Flatten[Table[N[Random[Integer, {-n, n}]], {i, n}], 1]];
> l[n] = Table[{tmp[[i]] , Random[Real, {-9, 9}], Random[Real, {-9, 9}]},
> {i,n}];
> il[n] = Union[tmp];
> ClearAll[tmp, n];
> 
> (*  MV1 - no laughing :) *)
> mv1=
>   Compile[{{il, _Real, 1}, {l, _Real, 2}},
>         Table[({il[[j]], Max[#1], Min[#1]}) &
>             [#1[[2]] & /@ Select[#1, #1[[1]] == il[[j]] &] &
>             [l]],
>         {j, 1, Length[il]}
>         ]
>   ];
> 
> (*  Timing Pent Pro 200  *)
> 
> mv1[il[20], l[20]]; // Timing
> mv1[il[100], l[100]]; // Timing
> mv1[il[500], l[500]]; // Timing
> mv1[il[1000], l[1000]]; // Timing
> 
> Out[171]=
> {0. Second, Null}
> 
> Out[172]=
> {0.031 Second, Null}
> 
> Out[173]=
> {0.157 Second, Null}
> 
> Out[174]=
> {3.734 Second, Null}
> 
> Out[175]=
> {15.687 Second, Null}
> 

Mark,

I'm not shure whether I understand you, but this are my contemplations:

First to set up the test data (and the problem)

SeedRandom[7700660088]

...this is such that you can reproduce my results, and compare. Any test
starts here.

n = 500;

...this varied in the test with n=500, n=1000, n=100000.

tmp = Sort[Flatten[Table[N[Random[Integer, {-n, n}]], {i, n}], 1]];

l = Table[{tmp[[i]], Random[Real, {-9, 9}], Random[Real, {-9, 9}]}, {i,
n}];

xtr = Union[Table[{Random[Integer, {1, n}]}, {40}]]

...in the tests I used n/10 instead of 40 here.

il = Union[Extract[tmp, xtr]]

Length[il]
37

So not all elements of il need be in First/@ l, but there is none of il
not in First/@ l


To define the problem we execute a simple procedure #1, not meant to be
the production program, also used to verify a faster program.

pos = Flatten[Position[l, {#, _, _}] & /@ il, 1]

... we get the positions of the il in l, there are 

Length[pos]
54

...hits (some more as are elements of il)

el = Extract[l, pos]

...we extract the elements of l at those positions

Union @ (First /@ el) === il
True

...correct

eel = Transpose[Rest /@ el];

{Min[#], Max[#]} &  /@ eel

{{-8.76991, 8.91636}, {-8.90909, 8.9885}}

...this is the minimum and maximum of the x and the y from the positions
as determined by il.


Now two proposals:

procedure #2:

Off[Part::"partw"]

Module[{tl = 1, ti = 0, leni = Length[il], min = Infinity {1, 1}, 
    max = -Infinity {1, 1}, vals},
  While[++ti <= leni,
    While[l[[tl, 1]] < il[[ti]], tl++];
    While[l[[tl, 1]] == il[[ti]],
        min = MapThread[Min, {min, vals = Rest[l[[tl++]]]}];
        max = MapThread[Max, {max, vals}]]
    ];
  Transpose[{min, max}]]
{{-8.76991, 8.91636}, {-8.90909, 8.9885}}

On[Part::"partw"]

...when il refers to the last element of l, there will be the error
message, but then all the work has already been done and the result is
correct.


procedure #3 

ff[{min_, max_}, {i_, x_, y_}] :=
  If[x === -n, ilast = i; {min, max},
    If[i === ilast, 
      {MapThread[Min, {min, {x, y}}], MapThread[Max, {max, {x, y}}]},
      {min, max}]]

Module[{iil, sl},
      iil = Transpose[{il, #, #}] &[Table[-n, {Length[il]}]];
      sl = Sort[Join[ iil, l]];
      Block[{ilast = Undefined}, 
        Transpose[
          Fold[ff, {{Infinity, Infinity}, {-Infinity, -Infinity}}, sl]]]
      ] 

{{-8.76991, 8.91636}, {-8.90909, 8.9885}}


Now the results from timing on my machine (the first ondex on "time" is
n, the second is the # of the procedure:

time[500, 1] = {1.1420000000000101*Second, 
     {{-8.769911152534853, 8.91635750026802}, 
      {-8.909087068099602, 8.98850295101392}}}
 
time[500, 2] = {0.12999999999999545*Second, 
     {{-8.769911152534853, 8.91635750026802}, 
      {-8.909087068099602, 8.98850295101392}}}
 
time[500, 3] = {0.20999999999999375*Second, 
     {{-8.769911152534853, 8.91635750026802}, 
      {-8.909087068099602, 8.98850295101392}}}
 
time[1000, 1] = {4.646000000000001*Second, 
     {{-8.84899469707297, 8.687342737143382}, 
      {-8.96776630933692, 8.933065726429106}}}
 
time[1000, 2] = {0.2599999999999909*Second, 
     {{-8.84899469707297, 8.687342737143382}, 
      {-8.96776630933692, 8.933065726429106}}}
 
time[1000, 3] = {0.42099999999999227*Second, 
     {{-8.84899469707297, 8.687342737143382}, 
      {-8.96776630933692, 8.933065726429106}}}
 
time[100000, 2] = {24.796000000000035*Second, 
     {{-8.998994945020392, 8.999526141047475}, 
      {-8.998287850696661, 8.999946643578806}}}
 
time[100000, 3] = {44.584*Second, 
     {{-8.998994945020392, 8.999526141047475}, 
      {-8.998287850696661, 8.999946643578806}}}

You see that both procedures #2 and #3 seem to be linear in n. This is
should be the case for #2, yet not quite for #3 since there is a Sort in
it. But #1 appears to be at least quadratic in n and I avoided timing it
with n=100000.

Finally a remark:

In procedure #3 I used -n to mark off iil

iil = Transpose[{il, #, #}] &[Table[-n, {Length[il]}]];

I would have like to use -Infinity instead, but to my greatest
astonishment I observed 
 
Sort[{Infinity, -Infinity, 5, -5}]
{-5, 5, -\[Infinity], \[Infinity]}

OrderedQ[%]
True

Max[{-Infinity, 5}]
5

Min[{-Infinity, 5}]
-\[Infinity]

Min[{Infinity, 5}]
5

Max[{Infinity, 5}]
\[Infinity]



Hartmut
References:
- Fast List Manipulation & more
  - From: "Mark Van De Vyver" <mvyver@bigfoot.com>
Prev by Date: Re: RE:graphics export to LAtek and Word
Next by Date: selfdefined operators
Previous by thread: Fast List Manipulation & more
Next by thread: Logarithmic BarChart