MathGroup Archive: November 2009 [00686]

[Date Index] [Thread Index] [Author Index]

Re: More Efficient Method

To: mathgroup at smc.vnet.net
Subject: [mg105109] Re: [mg105076] More Efficient Method
From: Leonid Shifrin <lshifr at gmail.com>
Date: Sat, 21 Nov 2009 03:34:09 -0500 (EST)
References: <200911201138.GAA03384@smc.vnet.net>

Hi Brian,

Your solution will be reasonably efficient if you need just a couple of
holes, but will become less and less efficient when you increase the number
of holes. Perhaps you will get  better /more concise solutions  which will
be using some built-in that I am not aware of,  but here is a (rather
verbose) solution that should be fast enough even for a large number of
holes.

Clear[bsearchMinCompMassive ];
bsearchMinCompMassive =
  Compile[{{list, _Real, 1}, {elems, _Real, 1}},
   Module[{n0 = 1, n2 = 1, n1 = Length[list],
     res = Array[0 &, {Length[elems]}], m = 1},
    For[n2 = 1, n2 <= Length[elems], n2++,
     n0 = 1; n1 = Length[list];
     While[n0 <= n1,
      m = Floor[(n0 + n1)/2];
      If[list[[m]] < elems[[n2]], n0 = m + 1, n1 = m - 1]];
     res[[n2]] = If[list[[m]] < elems[[n2]], m, m - 1]
     ];
    res]];

Clear[bsearchMaxCompMassive];
bsearchMaxCompMassive =
  Compile[{{list, _Real, 1}, {elems, _Real, 1}},
   Module[{n0 = 1, n2 = 1, n1 = Length[list],
     res = Array[0 &, {Length[elems]}], m = 1},
    For[n2 = 1, n2 <= Length[elems], n2++,
     n0 = 1; n1 = Length[list];
     While[n0 <= n1,
      m = Floor[(n0 + n1)/2];
      If[list[[m]] >  elems[[n2]], n1 = m - 1, n0 = m + 1]];
     res[[n2]] = If[list[[m]] > elems[[n2]], m, m + 1]
     ];
    res]];

Clear[deleteRegions];
deleteRegions[x : {__?NumberQ}, regs : {{_, _} ..}] :=
  Module[{sorted, ord, result, positions, sortedRegs = Sort@regs},
   sorted = x[[ord = Ordering[x]]];
   positions =
    Sort@ord[[Complement[Range[Length[sorted]],
        Apply[Sequence,
         Range @@@
          Transpose[{bsearchMaxCompMassive[sorted, #1],
              bsearchMinCompMassive[sorted, #2]} & @@
            Transpose[sortedRegs]]]]]];
   result = x[[positions]]];


Here are some benchmarks:

In[1]:= datalist=RandomSample[#,Length[#]]&@Range[11,100000,10];
In[2]:= targetlist={{40,1500},{1600,3300}};
In[3]:= targetList2 = Table[{10*i,10*i+50},{i,2,150}];

In[4]:= (resultdata10=subsets[datalist,targetlist])//Short//Timing

Out[4]=
{0.032,{88671,91041,81971,6041,86891,<<9673>>,55451,73361,28271,51601,72831}}

In[5]:= (resultdata20 = deleteRegions[datalist,targetlist])//Short//Timing

Out[5]=
{0.,{88671,91041,81971,6041,86891,<<9673>>,55451,73361,28271,51601,72831}}

In[6]:= resultdata10==resultdata20

Out[6]= True

In[7]:= (resultdata1=subsets[datalist,targetList2])//Short//Timing

Out[7]=
{2.313,{88671,91041,81971,6041,86891,<<9836>>,55451,73361,28271,51601,72831}}

In[8]:= (resultdata2 = deleteRegions[datalist,targetList2])//Short//Timing

Out[8]=
{0.016,{88671,91041,81971,6041,86891,<<9836>>,55451,73361,28271,51601,72831}}

In[9]:= resultdata1==resultdata2

Out[9]= True


Regards,
Leonid



On Fri, Nov 20, 2009 at 2:38 PM, blamm64 <blamm64 at charter.net> wrote:

> I have a couple of functions designed to poke a single hole, and to
> poke multiple holes, in a one-level list:
>
> We define a function which, given the imported pressure data, finds
> the subset of that pressure data excluding the pressure data points
> between "targetL " and "targetU".
>
> In[5]:= findsubset[data_?VectorQ,targetL_?NumericQ,targetU_?
> NumericQ] := Select[data,(#<=targetL || #>=targetU &)]
>
> This function will pluck out multiple holes in the data list.
>
> In[6]:= subsets[data_?VectorQ,tarList_?ListQ]:=Module[{tmp,tmp1},
> tmp=data;
> Do[tmp1=findsubset[tmp,tarList[[i,1]],tarList[[i,2]]];tmp=tmp1,
> {i,Dimensions[tarList][[1]]}];
> tmp
> ]
>
> The following works fine (big holes chosen not to give large result):
>
> In[7]:= datalist=Range[11,3411,10];
>
> In[12]:= targetlist={{40, 1500},{1600,3300}};
>
> In[13]:= resultdata=subsets[datalist,targetlist]
>
> Out[13]=
>
> {11,21,31,1501,1511,1521,1531,1541,1551,1561,1571,1581,1591,3301,3311,3321,3331,3341,3351,3361,3371,3381,3391,3401,3411}
>
> But if "datalist" happens to be very large, surely there is a (much)
> more efficient method?
>
> I tried unsuccessfully to use pure functions with Select, but have a
> somewhat nebulous feeling there's a pure function way of doing this
> effectively much more efficiently.
>
> I know, I know: the above have no consistency checking.  I also know
> "subsets" could be used in place of "findsubset" just by replacing the
> call of "findsubset" with the code of "findsubset" in "subsets".
>
> >From what I've seen on this forum there are some really experienced
> people who might provide an efficient way of implementing the above.
>
> -Brian L.
>
>

References:
- More Efficient Method
  - From: blamm64 <blamm64@charter.net>

Prev by Date: Re: Re: I broke the sum into pieces

Next by Date: Finding multiple value in Complex Analysis

Previous by thread: Re: More Efficient Method

Next by thread: Re: More Efficient Method