Re: Re: Re: list manipulation

```Wolf, Hartmut schrieb:
>
> jim leddon schrieb:
> >
> > Hi Folks,
> >
> > The program below takes the each value of the list,"events" , finds
> > which of these values falls within the intervals {x,y} which comprise
> > the list, "gaps" , then removes these values from the events list and
> > outputs this modified list which is called "eout". I'm getting an error
> > for incomplete expression, although I'm not sure if the algorithm itself
> > is doing the job. These reason why I wrote the loop to cue on the
> > integer parts of both lists is because these lists will eventually be
> > quite large, about 3000 elements each in which case I wanted to make the
> > process more efficient.
> >
> > Thanks if anyone can help.
> > Debbie L.
> >
> > gaps = {{1,5.5}, {2,4.3}, {2.7, 3.1}, {3.002, 4.007}, {10.001,
> > 10.007}, {10.101, 11.001}, {11.007, 12.0}};
> >
> > events ={6.7, 8.9, 2.3, 2.789, 10, 11.002, 10.115, 3.02, 2.75};
> ---x---x---
>
> Hello Debbie,
>
> before I'll show you a solution to your problem, allow me to give you a
> general hint, as how to tackle the problem: First, try to specify the
> problem without thinking of any algorithm at all, second try to define
> one or better some typical, yet small test cases, and do them by hand.
> While doing so you only think in terms of your application which you
> know best -- no programming is involved so far. Only after then try to
> find an algorithm and if you have not already spotted elements for a
> solution through prior experience, let be guided by your thinking of
> solving the problem by hand; don't, don't think at performance
> premature, think at correctness only. Implement the algorithm, test it
> until you are convinced it is all right. Then make some tests with
> longer inputs and _observe_ the performance (observing means measuring).
> Only if it turns out to be too slow, i.e. it presumable would take days
> or years for your real problem size, _then_ try to optimize it. Isolate
> the critical parts, understand why it is slow, and try to improve
> stepwise. Stop as soon as you can solve your application problem. Go on
> only if you want to win the programming contest, or are learning
> algorithmics.
>
> Now as to your problem, we recently had a very similar one in this
> newsgroup "checking for overlap" ?? -- names, names, give your problem a
> good name!! -- the only difference is that now you specifiy the gaps
> that filter out your events instead of the intervals that should contain
> them.
>
> ===== a solution:
>
> First we define some test data
>
> In[1]:=
> (gaps0 = Flatten[
>         Table[Partition[Sort[Table[Random[Real, 100.], {10000}]], 2],
> {3}],
>         1]) // Length
> Out[1]= 15000
>
> I saw that some of your gap intervals were overlapping, therefore we
> make up three runs for (small) gaps of real numbers between 1 and 100,
> say.
>
> In[2]:= gaps1 = Interval @@ gaps0; // Timing
> Out[2]= {13.97 Second, Null}
> In[3]:= Length[gaps1]
> Out[3]= 3719
>
> Interval is a _Mathematica_ idiom (look it up in Help and in The
> Matematica Book).
>
> We randomly delete some gaps in gaps1:
>
> In[4]:=
> deletepos = Union[Table[{Random[Integer, {1, Length[gaps1]}]}, {750}]];
> In[5]:= Length[delp]
> Out[5]= 690
>
> In[6]:= gaps = Delete[gaps1, deletepos];
> In[7]:= Length[gaps]
> Out[7]= 3029
>
> That's about the size of your problem, now let's define 3000 events:
>
> In[8]:= events = Table[Random[Real, 100.], {3000}];
>
> The "most elegant" solution published in the newsgroup was:
>
> In[9]:=
> survivingEvents = Select[events, ! IntervalMemberQ[gaps, #] &]; //
> Timing
> Out[9]= {122.607 Second, Null}
> In[10]:= Length[survivingEvents]
> Out[10]= 853
>
> It takes about two minutes (on my 166 MHz Pentium, 256 MByte, under MS
> Windows NT 4.0) and you may be content with it. Yet for your instance
> you can do faster:
>
> In[11]:=
> (surviving =
>         Module[{selected = {}, t = 1},
>           With[{tmax = Length[events], events = Sort[events]},
>             Do[While[t <= tmax && events[[t]] < gaps[[i, 1]],
>                   AppendTo[selected, {i, t++}]];
>                While[t <= tmax && events[[t]] <= gaps[[i, 2]],
>                   t++],
>               {i, 1, Length[gaps]}]
>           ];
>           selected]); // Timing
> Out[11]= {2.303 Second, Null}
> In[12]:= Length[surviving]
> Out[12]= 853
>
> If m is the number of your (final, non-overlapping) gaps, and n is then
> number of your events, then the runtime of the last algorithm is O[n log
> n] (for sorting the events) + O[n + m] (for selecting); the runtime of
> the first algorithm is (I suppose so, at last it could be made) O[n log
> m]. And (for both) it takes some time to build the interval structure
> (essentially sorting and merging overlapping components), which I guess
> to be of order O[m log m] (depending somewhat on the degree and type of
> overlapping). So it is not clear from the beginning, what to do best.
>
>
> In[23]:= gaps = {{1, 5.5}, {2, 4.3}, {2.7, 3.1}, {3.002, 4.007},
> {10.001,
>         10.007}, {10.101, 11.001}, {11.007, 12.0}};
>
> In[24]:= Interval @@ gaps
> Out[24]= Interval[{1, 5.5}, {10.001, 10.007}, {10.101, 11.001}, {11.007,
> 12.}]
>
> In[25]:= events = {6.7, 8.9, 2.3, 2.789, 10, 11.002, 10.115, 3.02,
> 2.75};
> In[26]:= Select[events, ! IntervalMemberQ[%%, #] &]
> Out[26]= {6.7, 8.9, 10, 11.002}
>
> was that your expected outcome?
>
-------------------------------------------------------------------------------

Now, I have to make a ***correction*** to my program above: first I made
little error with cut and paste, so you won't get the right output,
second I made some more performance testing (defining the test cases
differently) which has shown up an ***error***.

These are my new test data:

In[2]:=
Function[{n},
gapsX[n] =
Module[{kappa = 250./n},
Table[With[{c = Random[Real, {0., 100.}],
d = kappa Exp[-Random[Real, {0., 10.}]]}, {c - d,
c + d}], {n}]];

{Timing[truegapsX[n] =
Interval @@
gapsX[n];]\[LeftDoubleBracket]1\[RightDoubleBracket],
Length[truegapsX[n]]}] /@ {100, 300, 1000, 3000, 10000}

Out[2]=
{{0. Second, 63}, {0.051 Second, 157}, {0.23 Second, 575}, {1.652
Second,
1843}, {19.938 Second, 6054}}

This is the time for Applying "Interval", which needs sorting and
merging (internally)

Function[{n}, events[n] = Table[Random[Real, 100.], {n}];] /@ {100, 300,
1000,
3000, 10000}

Now the ***correction*** to my proposal:

In[32]:=
Function[n,
Timing[ss[n] =
Module[{selected = {}, t = 1},
With[{tmax = Length[events[n]], events =
Sort[events[n]],
gaps = truegapsX[n]},
Do[While[
t <= tmax &&
events\[LeftDoubleBracket]t\[RightDoubleBracket]
<
gaps\[LeftDoubleBracket]i,
1\[RightDoubleBracket],
AppendTo[selected,

events\[LeftDoubleBracket]t++\[RightDoubleBracket]]];
While[t <= tmax &&
events\[LeftDoubleBracket]t\[RightDoubleBracket]
<=
gaps\[LeftDoubleBracket]i,
2\[RightDoubleBracket],
t++], {i, 1, Length[gaps]}];
Join[selected, Take[events, {t, tmax}]]]];
]\[LeftDoubleBracket]1\[RightDoubleBracket] // (Print[#]; #)
&] /@ \
{100, 300, 1000, 3000, 10000}
Out[32]=
{0.061 Second, 0.14 Second, 0.671 Second, 3.164 Second, 25.497 Second}

This function is the fastest one by far (even if you add the times above
for applying Interval for fair comparison) *and* it is O[n]!

The "most elegant" solution behaves (a little bit) differently:

In[14]:=
Function[n,
Timing[se[n] =
Select[events[
n], ! IntervalMemberQ[
truegapsX[
n], #]
&];]\[LeftDoubleBracket]1\[RightDoubleBracket] \
// (Print[#]; #) &] /@ {100, 300, 1000, 3000, 10000}

Out[14]=
{0.11 Second, 0.691 Second, 8.242 Second, 75.188 Second, 841.75 Second}

Allan Hayes function in principle follows the same idea, but as stated,
it is flawed:

In[58]:=
Function[n,
Timing[sebh[n] =
DeleteCases[events[n],
elmt_ /;
IntervalMemberQ[Interval @@ gapsX[n],
elmt]];]\[LeftDoubleBracket]1\[RightDoubleBracket]
// \
(Print[#]; #) &] /@ {100, 300, 1000, 3000}
Out[58]= \$Aborted

The problem is Applying Interval takes computation time (see above),
doing it inside the selection function means doing that over and over
again. When corrected,

Function[n,
Timing[seah[n] =
With[{gaps = Interval @@ gapsX[n]},
DeleteCases[events[n],
elmt_ /;
IntervalMemberQ[gaps,

elmt]]];]\[LeftDoubleBracket]1\[RightDoubleBracket] // \
(Print[#]; #) &] /@ {100, 300, 1000, 3000}

it shows:

{{0.12 Second, 100}, {0.731 Second, 300}, {8.432 Second,
1000}, {76.661 Second, 3000}, {"-not done-", 10000}}

Ranco Bojanic's version

In[72]:=
test[x_, y_] :=
Module[{len = Length[y], in = True},
For[j = 1, j <= len, j++,
If[y[[j, 1]] <= x && x <= y[[j, 2]], in = False; Break]];
Return[in]];
In[73]:=
Function[n,
Timing[serb[n] =
Select[events[n],
test[#,
gapsX[n]]
&];]\[LeftDoubleBracket]1\[RightDoubleBracket] \
// (Print[#]; #) &] /@ {100, 300}

Out[73]=
{1.833 Second, 18.036 Second}

is not usable for large input, since it compares all events with on the
average half of all (original, unconsolidated) gaps.

Atul Sharma published two funtions, the first one isn't as bad as he
claims:

Function[n,
Timing[seat[n] = With[{events = events[n], gaps = gapsX[n]}, tmp1 =
{};
eout = {};
max = Length[events];
Do[tmp1 = {};
ei = events[[i]];
tmp1 = Select[gaps, (#[[1]] <= ei && #[[2]] >= ei) &];
If[(tmp1 == {}), AppendTo[eout, ei]], {i, 1, max}];
eout];]\[LeftDoubleBracket]1\[RightDoubleBracket] // (Print[#]; #) &] /@
{100,
300, 1000}

{1.332 Second, 12.858 Second, 144.398 Second}

The second one doesn't scale well

In[82]:=
testQ[x_List] := Map[IntervalMemberQ[Apply[Interval, gaps], #] &, x]

In[84]:=
Function[n,
Block[{events = events[n], gaps = gapsX[n], keepEvents},
keepEvents[x_List] := Module[{newEvents, tq}, newEvents = {};
tq = testQ[events];
Do[If[! tq[[i]], AppendTo[newEvents, events[[i]]]], {i, 1,
Length[events]}];
newEvents];
Timing[seat2[n] =
keepEvents[
events];]\[LeftDoubleBracket]1\[RightDoubleBracket] //
\
(Print[#]; #) &]] /@ {100, 300, 1000}

Out[84]=
{1.002 Second, 11.907 Second, 242.829 Second}

One might also object at the programming style: the gaps are frozen into
the definition of "KeepEvents" and imported there through a global
variable.

I compared all results, they are all the same (except for order, my
(efficient) proposal returns the events sorted, which may be an
advantage or disadvantage. In the latter case one has to take provision
as to reorder the filtered events to original order, for which there is
plenty of time left.

My notebook for the comparison tests is available.

Kind regards, Hartmut

```

• Prev by Date: Re: How to Capture the contents of a X window larger then the screen
• Next by Date: Re: hexagon tiled torus
• Previous by thread: Re: Re: list manipulation
• Next by thread: Re: Re: Re: list manipulation