Re: Re: Re: list manipulation
- To: mathgroup at smc.vnet.net
- Subject: [mg20681] Re: [mg20614] Re: [mg20601] Re: list manipulation
- From: "Wolf, Hartmut" <hwolf at debis.com>
- Date: Sun, 7 Nov 1999 02:10:13 -0500
- References: <199911020735.CAA26984@smc.vnet.net.> <199911040713.CAA02648@smc.vnet.net.>
- Sender: owner-wri-mathgroup at wolfram.com
Wolf, Hartmut schrieb: > > jim leddon schrieb: > > > > Hi Folks, > > > > The program below takes the each value of the list,"events" , finds > > which of these values falls within the intervals {x,y} which comprise > > the list, "gaps" , then removes these values from the events list and > > outputs this modified list which is called "eout". I'm getting an error > > for incomplete expression, although I'm not sure if the algorithm itself > > is doing the job. These reason why I wrote the loop to cue on the > > integer parts of both lists is because these lists will eventually be > > quite large, about 3000 elements each in which case I wanted to make the > > process more efficient. > > > > Thanks if anyone can help. > > Debbie L. > > > > gaps = {{1,5.5}, {2,4.3}, {2.7, 3.1}, {3.002, 4.007}, {10.001, > > 10.007}, {10.101, 11.001}, {11.007, 12.0}}; > > > > events ={6.7, 8.9, 2.3, 2.789, 10, 11.002, 10.115, 3.02, 2.75}; > ---x---x--- > > Hello Debbie, > > before I'll show you a solution to your problem, allow me to give you a > general hint, as how to tackle the problem: First, try to specify the > problem without thinking of any algorithm at all, second try to define > one or better some typical, yet small test cases, and do them by hand. > While doing so you only think in terms of your application which you > know best -- no programming is involved so far. Only after then try to > find an algorithm and if you have not already spotted elements for a > solution through prior experience, let be guided by your thinking of > solving the problem by hand; don't, don't think at performance > premature, think at correctness only. Implement the algorithm, test it > until you are convinced it is all right. Then make some tests with > longer inputs and _observe_ the performance (observing means measuring). > Only if it turns out to be too slow, i.e. it presumable would take days > or years for your real problem size, _then_ try to optimize it. Isolate > the critical parts, understand why it is slow, and try to improve > stepwise. Stop as soon as you can solve your application problem. Go on > only if you want to win the programming contest, or are learning > algorithmics. > > Now as to your problem, we recently had a very similar one in this > newsgroup "checking for overlap" ?? -- names, names, give your problem a > good name!! -- the only difference is that now you specifiy the gaps > that filter out your events instead of the intervals that should contain > them. > > ===== a solution: > > First we define some test data > > In[1]:= > (gaps0 = Flatten[ > Table[Partition[Sort[Table[Random[Real, 100.], {10000}]], 2], > {3}], > 1]) // Length > Out[1]= 15000 > > I saw that some of your gap intervals were overlapping, therefore we > make up three runs for (small) gaps of real numbers between 1 and 100, > say. > > In[2]:= gaps1 = Interval @@ gaps0; // Timing > Out[2]= {13.97 Second, Null} > In[3]:= Length[gaps1] > Out[3]= 3719 > > Interval is a _Mathematica_ idiom (look it up in Help and in The > Matematica Book). > > We randomly delete some gaps in gaps1: > > In[4]:= > deletepos = Union[Table[{Random[Integer, {1, Length[gaps1]}]}, {750}]]; > In[5]:= Length[delp] > Out[5]= 690 > > In[6]:= gaps = Delete[gaps1, deletepos]; > In[7]:= Length[gaps] > Out[7]= 3029 > > That's about the size of your problem, now let's define 3000 events: > > In[8]:= events = Table[Random[Real, 100.], {3000}]; > > The "most elegant" solution published in the newsgroup was: > > In[9]:= > survivingEvents = Select[events, ! IntervalMemberQ[gaps, #] &]; // > Timing > Out[9]= {122.607 Second, Null} > In[10]:= Length[survivingEvents] > Out[10]= 853 > > It takes about two minutes (on my 166 MHz Pentium, 256 MByte, under MS > Windows NT 4.0) and you may be content with it. Yet for your instance > you can do faster: > > In[11]:= > (surviving = > Module[{selected = {}, t = 1}, > With[{tmax = Length[events], events = Sort[events]}, > Do[While[t <= tmax && events[[t]] < gaps[[i, 1]], > AppendTo[selected, {i, t++}]]; > While[t <= tmax && events[[t]] <= gaps[[i, 2]], > t++], > {i, 1, Length[gaps]}] > ]; > selected]); // Timing > Out[11]= {2.303 Second, Null} > In[12]:= Length[surviving] > Out[12]= 853 > > If m is the number of your (final, non-overlapping) gaps, and n is then > number of your events, then the runtime of the last algorithm is O[n log > n] (for sorting the events) + O[n + m] (for selecting); the runtime of > the first algorithm is (I suppose so, at last it could be made) O[n log > m]. And (for both) it takes some time to build the interval structure > (essentially sorting and merging overlapping components), which I guess > to be of order O[m log m] (depending somewhat on the degree and type of > overlapping). So it is not clear from the beginning, what to do best. > > Finally your example gives > > In[23]:= gaps = {{1, 5.5}, {2, 4.3}, {2.7, 3.1}, {3.002, 4.007}, > {10.001, > 10.007}, {10.101, 11.001}, {11.007, 12.0}}; > > In[24]:= Interval @@ gaps > Out[24]= Interval[{1, 5.5}, {10.001, 10.007}, {10.101, 11.001}, {11.007, > 12.}] > > In[25]:= events = {6.7, 8.9, 2.3, 2.789, 10, 11.002, 10.115, 3.02, > 2.75}; > In[26]:= Select[events, ! IntervalMemberQ[%%, #] &] > Out[26]= {6.7, 8.9, 10, 11.002} > > was that your expected outcome? > ------------------------------------------------------------------------------- Now, I have to make a ***correction*** to my program above: first I made little error with cut and paste, so you won't get the right output, second I made some more performance testing (defining the test cases differently) which has shown up an ***error***. These are my new test data: In[2]:= Function[{n}, gapsX[n] = Module[{kappa = 250./n}, Table[With[{c = Random[Real, {0., 100.}], d = kappa Exp[-Random[Real, {0., 10.}]]}, {c - d, c + d}], {n}]]; {Timing[truegapsX[n] = Interval @@ gapsX[n];]\[LeftDoubleBracket]1\[RightDoubleBracket], Length[truegapsX[n]]}] /@ {100, 300, 1000, 3000, 10000} Out[2]= {{0. Second, 63}, {0.051 Second, 157}, {0.23 Second, 575}, {1.652 Second, 1843}, {19.938 Second, 6054}} This is the time for Applying "Interval", which needs sorting and merging (internally) Function[{n}, events[n] = Table[Random[Real, 100.], {n}];] /@ {100, 300, 1000, 3000, 10000} Now the ***correction*** to my proposal: In[32]:= Function[n, Timing[ss[n] = Module[{selected = {}, t = 1}, With[{tmax = Length[events[n]], events = Sort[events[n]], gaps = truegapsX[n]}, Do[While[ t <= tmax && events\[LeftDoubleBracket]t\[RightDoubleBracket] < gaps\[LeftDoubleBracket]i, 1\[RightDoubleBracket], AppendTo[selected, events\[LeftDoubleBracket]t++\[RightDoubleBracket]]]; While[t <= tmax && events\[LeftDoubleBracket]t\[RightDoubleBracket] <= gaps\[LeftDoubleBracket]i, 2\[RightDoubleBracket], t++], {i, 1, Length[gaps]}]; Join[selected, Take[events, {t, tmax}]]]]; ]\[LeftDoubleBracket]1\[RightDoubleBracket] // (Print[#]; #) &] /@ \ {100, 300, 1000, 3000, 10000} Out[32]= {0.061 Second, 0.14 Second, 0.671 Second, 3.164 Second, 25.497 Second} This function is the fastest one by far (even if you add the times above for applying Interval for fair comparison) *and* it is O[n]! The "most elegant" solution behaves (a little bit) differently: In[14]:= Function[n, Timing[se[n] = Select[events[ n], ! IntervalMemberQ[ truegapsX[ n], #] &];]\[LeftDoubleBracket]1\[RightDoubleBracket] \ // (Print[#]; #) &] /@ {100, 300, 1000, 3000, 10000} Out[14]= {0.11 Second, 0.691 Second, 8.242 Second, 75.188 Second, 841.75 Second} Allan Hayes function in principle follows the same idea, but as stated, it is flawed: In[58]:= Function[n, Timing[sebh[n] = DeleteCases[events[n], elmt_ /; IntervalMemberQ[Interval @@ gapsX[n], elmt]];]\[LeftDoubleBracket]1\[RightDoubleBracket] // \ (Print[#]; #) &] /@ {100, 300, 1000, 3000} Out[58]= $Aborted The problem is Applying Interval takes computation time (see above), doing it inside the selection function means doing that over and over again. When corrected, Function[n, Timing[seah[n] = With[{gaps = Interval @@ gapsX[n]}, DeleteCases[events[n], elmt_ /; IntervalMemberQ[gaps, elmt]]];]\[LeftDoubleBracket]1\[RightDoubleBracket] // \ (Print[#]; #) &] /@ {100, 300, 1000, 3000} it shows: {{0.12 Second, 100}, {0.731 Second, 300}, {8.432 Second, 1000}, {76.661 Second, 3000}, {"-not done-", 10000}} Ranco Bojanic's version In[72]:= test[x_, y_] := Module[{len = Length[y], in = True}, For[j = 1, j <= len, j++, If[y[[j, 1]] <= x && x <= y[[j, 2]], in = False; Break]]; Return[in]]; In[73]:= Function[n, Timing[serb[n] = Select[events[n], test[#, gapsX[n]] &];]\[LeftDoubleBracket]1\[RightDoubleBracket] \ // (Print[#]; #) &] /@ {100, 300} Out[73]= {1.833 Second, 18.036 Second} is not usable for large input, since it compares all events with on the average half of all (original, unconsolidated) gaps. Atul Sharma published two funtions, the first one isn't as bad as he claims: Function[n, Timing[seat[n] = With[{events = events[n], gaps = gapsX[n]}, tmp1 = {}; eout = {}; max = Length[events]; Do[tmp1 = {}; ei = events[[i]]; tmp1 = Select[gaps, (#[[1]] <= ei && #[[2]] >= ei) &]; If[(tmp1 == {}), AppendTo[eout, ei]], {i, 1, max}]; eout];]\[LeftDoubleBracket]1\[RightDoubleBracket] // (Print[#]; #) &] /@ {100, 300, 1000} {1.332 Second, 12.858 Second, 144.398 Second} The second one doesn't scale well In[82]:= testQ[x_List] := Map[IntervalMemberQ[Apply[Interval, gaps], #] &, x] In[84]:= Function[n, Block[{events = events[n], gaps = gapsX[n], keepEvents}, keepEvents[x_List] := Module[{newEvents, tq}, newEvents = {}; tq = testQ[events]; Do[If[! tq[[i]], AppendTo[newEvents, events[[i]]]], {i, 1, Length[events]}]; newEvents]; Timing[seat2[n] = keepEvents[ events];]\[LeftDoubleBracket]1\[RightDoubleBracket] // \ (Print[#]; #) &]] /@ {100, 300, 1000} Out[84]= {1.002 Second, 11.907 Second, 242.829 Second} One might also object at the programming style: the gaps are frozen into the definition of "KeepEvents" and imported there through a global variable. I compared all results, they are all the same (except for order, my (efficient) proposal returns the events sorted, which may be an advantage or disadvantage. In the latter case one has to take provision as to reorder the filtered events to original order, for which there is plenty of time left. My notebook for the comparison tests is available. Kind regards, Hartmut
- References:
- Re: list manipulation
- From: "jim leddon" <jleddon@home.com>
- Re: Re: list manipulation
- From: "Wolf, Hartmut" <hwolf@debis.com>
- Re: list manipulation