Re: Re: list manipulation
- To: mathgroup at smc.vnet.net
- Subject: [mg20614] Re: [mg20601] Re: list manipulation
- From: "Wolf, Hartmut" <hwolf at debis.com>
- Date: Thu, 4 Nov 1999 02:13:30 -0500
- Organization: debis Systemhaus
- References: <199911020735.CAA26984@smc.vnet.net.>
- Sender: owner-wri-mathgroup at wolfram.com
jim leddon schrieb: > > Hi Folks, > > The program below takes the each value of the list,"events" , finds > which of these values falls within the intervals {x,y} which comprise > the list, "gaps" , then removes these values from the events list and > outputs this modified list which is called "eout". I'm getting an error > for incomplete expression, although I'm not sure if the algorithm itself > is doing the job. These reason why I wrote the loop to cue on the > integer parts of both lists is because these lists will eventually be > quite large, about 3000 elements each in which case I wanted to make the > process more efficient. > > Thanks if anyone can help. > Debbie L. > > gaps = {{1,5.5}, {2,4.3}, {2.7, 3.1}, {3.002, 4.007}, {10.001, > 10.007}, {10.101, 11.001}, {11.007, 12.0}}; > > events ={6.7, 8.9, 2.3, 2.789, 10, 11.002, 10.115, 3.02, 2.75}; ---x---x--- Hello Debbie, before I'll show you a solution to your problem, allow me to give you a general hint, as how to tackle the problem: First, try to specify the problem without thinking of any algorithm at all, second try to define one or better some typical, yet small test cases, and do them by hand. While doing so you only think in terms of your application which you know best -- no programming is involved so far. Only after then try to find an algorithm and if you have not already spotted elements for a solution through prior experience, let be guided by your thinking of solving the problem by hand; don't, don't think at performance premature, think at correctness only. Implement the algorithm, test it until you are convinced it is all right. Then make some tests with longer inputs and _observe_ the performance (observing means measuring). Only if it turns out to be too slow, i.e. it presumable would take days or years for your real problem size, _then_ try to optimize it. Isolate the critical parts, understand why it is slow, and try to improve stepwise. Stop as soon as you can solve your application problem. Go on only if you want to win the programming contest, or are learning algorithmics. Now as to your problem, we recently had a very similar one in this newsgroup "checking for overlap" ?? -- names, names, give your problem a good name!! -- the only difference is that now you specifiy the gaps that filter out your events instead of the intervals that should contain them. ===== a solution: First we define some test data In[1]:= (gaps0 = Flatten[ Table[Partition[Sort[Table[Random[Real, 100.], {10000}]], 2], {3}], 1]) // Length Out[1]= 15000 I saw that some of your gap intervals were overlapping, therefore we make up three runs for (small) gaps of real numbers between 1 and 100, say. In[2]:= gaps1 = Interval @@ gaps0; // Timing Out[2]= {13.97 Second, Null} In[3]:= Length[gaps1] Out[3]= 3719 Interval is a _Mathematica_ idiom (look it up in Help and in The Matematica Book). We randomly delete some gaps in gaps1: In[4]:= deletepos = Union[Table[{Random[Integer, {1, Length[gaps1]}]}, {750}]]; In[5]:= Length[delp] Out[5]= 690 In[6]:= gaps = Delete[gaps1, deletepos]; In[7]:= Length[gaps] Out[7]= 3029 That's about the size of your problem, now let's define 3000 events: In[8]:= events = Table[Random[Real, 100.], {3000}]; The "most elegant" solution published in the newsgroup was: In[9]:= survivingEvents = Select[events, ! IntervalMemberQ[gaps, #] &]; // Timing Out[9]= {122.607 Second, Null} In[10]:= Length[survivingEvents] Out[10]= 853 It takes about two minutes (on my 166 MHz Pentium, 256 MByte, under MS Windows NT 4.0) and you may be content with it. Yet for your instance you can do faster: In[11]:= (surviving = Module[{selected = {}, t = 1}, With[{tmax = Length[events], events = Sort[events]}, Do[While[t <= tmax && events[[t]] < gaps[[i, 1]], AppendTo[selected, {i, t++}]]; While[t <= tmax && events[[t]] <= gaps[[i, 2]], t++], {i, 1, Length[gaps]}] ]; selected]); // Timing Out[11]= {2.303 Second, Null} In[12]:= Length[surviving] Out[12]= 853 If m is the number of your (final, non-overlapping) gaps, and n is then number of your events, then the runtime of the last algorithm is O[n log n] (for sorting the events) + O[n + m] (for selecting); the runtime of the first algorithm is (I suppose so, at last it could be made) O[n log m]. And (for both) it takes some time to build the interval structure (essentially sorting and merging overlapping components), which I guess to be of order O[m log m] (depending somewhat on the degree and type of overlapping). So it is not clear from the beginning, what to do best. Finally your example gives In[23]:= gaps = {{1, 5.5}, {2, 4.3}, {2.7, 3.1}, {3.002, 4.007}, {10.001, 10.007}, {10.101, 11.001}, {11.007, 12.0}}; In[24]:= Interval @@ gaps Out[24]= Interval[{1, 5.5}, {10.001, 10.007}, {10.101, 11.001}, {11.007, 12.}] In[25]:= events = {6.7, 8.9, 2.3, 2.789, 10, 11.002, 10.115, 3.02, 2.75}; In[26]:= Select[events, ! IntervalMemberQ[%%, #] &] Out[26]= {6.7, 8.9, 10, 11.002} was that your expected outcome? Kind regards, Hartmut
- Follow-Ups:
- Re: Re: Re: list manipulation
- From: "Wolf, Hartmut" <hwolf@debis.com>
- Re: Re: Re: list manipulation
- References:
- Re: list manipulation
- From: "jim leddon" <jleddon@home.com>
- Re: list manipulation