Re: More Efficient Method
- To: mathgroup at smc.vnet.net
- Subject: [mg105109] Re: [mg105076] More Efficient Method
- From: Leonid Shifrin <lshifr at gmail.com>
- Date: Sat, 21 Nov 2009 03:34:09 -0500 (EST)
- References: <200911201138.GAA03384@smc.vnet.net>
Hi Brian, Your solution will be reasonably efficient if you need just a couple of holes, but will become less and less efficient when you increase the number of holes. Perhaps you will get better /more concise solutions which will be using some built-in that I am not aware of, but here is a (rather verbose) solution that should be fast enough even for a large number of holes. Clear[bsearchMinCompMassive ]; bsearchMinCompMassive = Compile[{{list, _Real, 1}, {elems, _Real, 1}}, Module[{n0 = 1, n2 = 1, n1 = Length[list], res = Array[0 &, {Length[elems]}], m = 1}, For[n2 = 1, n2 <= Length[elems], n2++, n0 = 1; n1 = Length[list]; While[n0 <= n1, m = Floor[(n0 + n1)/2]; If[list[[m]] < elems[[n2]], n0 = m + 1, n1 = m - 1]]; res[[n2]] = If[list[[m]] < elems[[n2]], m, m - 1] ]; res]]; Clear[bsearchMaxCompMassive]; bsearchMaxCompMassive = Compile[{{list, _Real, 1}, {elems, _Real, 1}}, Module[{n0 = 1, n2 = 1, n1 = Length[list], res = Array[0 &, {Length[elems]}], m = 1}, For[n2 = 1, n2 <= Length[elems], n2++, n0 = 1; n1 = Length[list]; While[n0 <= n1, m = Floor[(n0 + n1)/2]; If[list[[m]] > elems[[n2]], n1 = m - 1, n0 = m + 1]]; res[[n2]] = If[list[[m]] > elems[[n2]], m, m + 1] ]; res]]; Clear[deleteRegions]; deleteRegions[x : {__?NumberQ}, regs : {{_, _} ..}] := Module[{sorted, ord, result, positions, sortedRegs = Sort@regs}, sorted = x[[ord = Ordering[x]]]; positions = Sort@ord[[Complement[Range[Length[sorted]], Apply[Sequence, Range @@@ Transpose[{bsearchMaxCompMassive[sorted, #1], bsearchMinCompMassive[sorted, #2]} & @@ Transpose[sortedRegs]]]]]]; result = x[[positions]]]; Here are some benchmarks: In[1]:= datalist=RandomSample[#,Length[#]]&@Range[11,100000,10]; In[2]:= targetlist={{40,1500},{1600,3300}}; In[3]:= targetList2 = Table[{10*i,10*i+50},{i,2,150}]; In[4]:= (resultdata10=subsets[datalist,targetlist])//Short//Timing Out[4]= {0.032,{88671,91041,81971,6041,86891,<<9673>>,55451,73361,28271,51601,72831}} In[5]:= (resultdata20 = deleteRegions[datalist,targetlist])//Short//Timing Out[5]= {0.,{88671,91041,81971,6041,86891,<<9673>>,55451,73361,28271,51601,72831}} In[6]:= resultdata10==resultdata20 Out[6]= True In[7]:= (resultdata1=subsets[datalist,targetList2])//Short//Timing Out[7]= {2.313,{88671,91041,81971,6041,86891,<<9836>>,55451,73361,28271,51601,72831}} In[8]:= (resultdata2 = deleteRegions[datalist,targetList2])//Short//Timing Out[8]= {0.016,{88671,91041,81971,6041,86891,<<9836>>,55451,73361,28271,51601,72831}} In[9]:= resultdata1==resultdata2 Out[9]= True Regards, Leonid On Fri, Nov 20, 2009 at 2:38 PM, blamm64 <blamm64 at charter.net> wrote: > I have a couple of functions designed to poke a single hole, and to > poke multiple holes, in a one-level list: > > We define a function which, given the imported pressure data, finds > the subset of that pressure data excluding the pressure data points > between "targetL " and "targetU". > > In[5]:= findsubset[data_?VectorQ,targetL_?NumericQ,targetU_? > NumericQ] := Select[data,(#<=targetL || #>=targetU &)] > > This function will pluck out multiple holes in the data list. > > In[6]:= subsets[data_?VectorQ,tarList_?ListQ]:=Module[{tmp,tmp1}, > tmp=data; > Do[tmp1=findsubset[tmp,tarList[[i,1]],tarList[[i,2]]];tmp=tmp1, > {i,Dimensions[tarList][[1]]}]; > tmp > ] > > The following works fine (big holes chosen not to give large result): > > In[7]:= datalist=Range[11,3411,10]; > > In[12]:= targetlist={{40, 1500},{1600,3300}}; > > In[13]:= resultdata=subsets[datalist,targetlist] > > Out[13]= > > {11,21,31,1501,1511,1521,1531,1541,1551,1561,1571,1581,1591,3301,3311,3321,3331,3341,3351,3361,3371,3381,3391,3401,3411} > > But if "datalist" happens to be very large, surely there is a (much) > more efficient method? > > I tried unsuccessfully to use pure functions with Select, but have a > somewhat nebulous feeling there's a pure function way of doing this > effectively much more efficiently. > > I know, I know: the above have no consistency checking. I also know > "subsets" could be used in place of "findsubset" just by replacing the > call of "findsubset" with the code of "findsubset" in "subsets". > > >From what I've seen on this forum there are some really experienced > people who might provide an efficient way of implementing the above. > > -Brian L. > >
- References:
- More Efficient Method
- From: blamm64 <blamm64@charter.net>
- More Efficient Method