       Re: Re: list manipulation

• To: mathgroup at smc.vnet.net
• Subject: [mg20614] Re: [mg20601] Re: list manipulation
• From: "Wolf, Hartmut" <hwolf at debis.com>
• Date: Thu, 4 Nov 1999 02:13:30 -0500
• Organization: debis Systemhaus
• References: <199911020735.CAA26984@smc.vnet.net.>
• Sender: owner-wri-mathgroup at wolfram.com

```jim leddon schrieb:
>
> Hi Folks,
>
> The program below takes the each value of the list,"events" , finds
> which of these values falls within the intervals {x,y} which comprise
> the list, "gaps" , then removes these values from the events list and
> outputs this modified list which is called "eout". I'm getting an error
> for incomplete expression, although I'm not sure if the algorithm itself
> is doing the job. These reason why I wrote the loop to cue on the
> integer parts of both lists is because these lists will eventually be
> quite large, about 3000 elements each in which case I wanted to make the
> process more efficient.
>
> Thanks if anyone can help.
> Debbie L.
>
> gaps = {{1,5.5}, {2,4.3}, {2.7, 3.1}, {3.002, 4.007}, {10.001,
> 10.007}, {10.101, 11.001}, {11.007, 12.0}};
>
> events ={6.7, 8.9, 2.3, 2.789, 10, 11.002, 10.115, 3.02, 2.75};
---x---x---

Hello Debbie,

before I'll show you a solution to your problem, allow me to give you a
general hint, as how to tackle the problem: First, try to specify the
problem without thinking of any algorithm at all, second try to define
one or better some typical, yet small test cases, and do them by hand.
While doing so you only think in terms of your application which you
know best -- no programming is involved so far. Only after then try to
find an algorithm and if you have not already spotted elements for a
solution through prior experience, let be guided by your thinking of
solving the problem by hand; don't, don't think at performance
premature, think at correctness only. Implement the algorithm, test it
until you are convinced it is all right. Then make some tests with
longer inputs and _observe_ the performance (observing means measuring).
Only if it turns out to be too slow, i.e. it presumable would take days
or years for your real problem size, _then_ try to optimize it. Isolate
the critical parts, understand why it is slow, and try to improve
stepwise. Stop as soon as you can solve your application problem. Go on
only if you want to win the programming contest, or are learning
algorithmics.

Now as to your problem, we recently had a very similar one in this
newsgroup "checking for overlap" ?? -- names, names, give your problem a
good name!! -- the only difference is that now you specifiy the gaps
that filter out your events instead of the intervals that should contain
them.

===== a solution:

First we define some test data

In:=
(gaps0 = Flatten[
Table[Partition[Sort[Table[Random[Real, 100.], {10000}]], 2],
{3}],
1]) // Length
Out= 15000

I saw that some of your gap intervals were overlapping, therefore we
make up three runs for (small) gaps of real numbers between 1 and 100,
say.

In:= gaps1 = Interval @@ gaps0; // Timing
Out= {13.97 Second, Null}
In:= Length[gaps1]
Out= 3719

Interval is a _Mathematica_ idiom (look it up in Help and in The
Matematica Book).

We randomly delete some gaps in gaps1:

In:=
deletepos = Union[Table[{Random[Integer, {1, Length[gaps1]}]}, {750}]];
In:= Length[delp]
Out= 690

In:= gaps = Delete[gaps1, deletepos];
In:= Length[gaps]
Out= 3029

That's about the size of your problem, now let's define 3000 events:

In:= events = Table[Random[Real, 100.], {3000}];

The "most elegant" solution published in the newsgroup was:

In:=
survivingEvents = Select[events, ! IntervalMemberQ[gaps, #] &]; //
Timing
Out= {122.607 Second, Null}
In:= Length[survivingEvents]
Out= 853

It takes about two minutes (on my 166 MHz Pentium, 256 MByte, under MS
Windows NT 4.0) and you may be content with it. Yet for your instance
you can do faster:

In:=
(surviving =
Module[{selected = {}, t = 1},
With[{tmax = Length[events], events = Sort[events]},
Do[While[t <= tmax && events[[t]] < gaps[[i, 1]],
AppendTo[selected, {i, t++}]];
While[t <= tmax && events[[t]] <= gaps[[i, 2]],
t++],
{i, 1, Length[gaps]}]
];
selected]); // Timing
Out= {2.303 Second, Null}
In:= Length[surviving]
Out= 853

If m is the number of your (final, non-overlapping) gaps, and n is then
number of your events, then the runtime of the last algorithm is O[n log
n] (for sorting the events) + O[n + m] (for selecting); the runtime of
the first algorithm is (I suppose so, at last it could be made) O[n log
m]. And (for both) it takes some time to build the interval structure
(essentially sorting and merging overlapping components), which I guess
to be of order O[m log m] (depending somewhat on the degree and type of
overlapping). So it is not clear from the beginning, what to do best.

In:= gaps = {{1, 5.5}, {2, 4.3}, {2.7, 3.1}, {3.002, 4.007},
{10.001,
10.007}, {10.101, 11.001}, {11.007, 12.0}};

In:= Interval @@ gaps
Out= Interval[{1, 5.5}, {10.001, 10.007}, {10.101, 11.001}, {11.007,
12.}]

In:= events = {6.7, 8.9, 2.3, 2.789, 10, 11.002, 10.115, 3.02,
2.75};
In:= Select[events, ! IntervalMemberQ[%%, #] &]
Out= {6.7, 8.9, 10, 11.002}