MathGroup Archive: November 2009 [00681]

[Date Index] [Thread Index] [Author Index]

Re: More Efficient Method

To: mathgroup at smc.vnet.net
Subject: [mg105104] Re: [mg105076] More Efficient Method
From: Daniel Lichtblau <danl at wolfram.com>
Date: Sat, 21 Nov 2009 03:33:13 -0500 (EST)
References: <200911201138.GAA03384@smc.vnet.net>

blamm64 wrote:
> I have a couple of functions designed to poke a single hole, and to
> poke multiple holes, in a one-level list:
> 
> We define a function which, given the imported pressure data, finds
> the subset of that pressure data excluding the pressure data points
> between "targetL " and "targetU".
> 
> In[5]:= findsubset[data_?VectorQ,targetL_?NumericQ,targetU_?
> NumericQ] := Select[data,(#<=targetL || #>=targetU &)]
> 
> This function will pluck out multiple holes in the data list.
> 
> In[6]:= subsets[data_?VectorQ,tarList_?ListQ]:=Module[{tmp,tmp1},
> tmp=data;
> Do[tmp1=findsubset[tmp,tarList[[i,1]],tarList[[i,2]]];tmp=tmp1,
> {i,Dimensions[tarList][[1]]}];
> tmp
> ]
> 
> The following works fine (big holes chosen not to give large result):
> 
> In[7]:= datalist=Range[11,3411,10];
> 
> In[12]:= targetlist={{40, 1500},{1600,3300}};
> 
> In[13]:= resultdata=subsets[datalist,targetlist]
> 
> Out[13]=
> {11,21,31,1501,1511,1521,1531,1541,1551,1561,1571,1581,1591,3301,3311,3321,3331,3341,3351,3361,3371,3381,3391,3401,3411}
> 
> But if "datalist" happens to be very large, surely there is a (much)
> more efficient method?
> 
> I tried unsuccessfully to use pure functions with Select, but have a
> somewhat nebulous feeling there's a pure function way of doing this
> effectively much more efficiently.
> 
> I know, I know: the above have no consistency checking.  I also know
> "subsets" could be used in place of "findsubset" just by replacing the
> call of "findsubset" with the code of "findsubset" in "subsets".
> 
>>From what I've seen on this forum there are some really experienced
> people who might provide an efficient way of implementing the above.
> 
> -Brian L.

If you are working with integers then the method below should be fine. 
Otherwise you may need to "fuzzify" a bit differently. I use 
IntervalMemberQ to determine which elements in the data list to omit, 
and then does the selection using Select (I tried Pick, and it was 
perhaps a half a hair slower).

subsets2[data_?VectorQ,tarList_?ListQ] := Module[
   {intv=Apply[Interval,Map[#+{.5,-.5}&,tarList]]},
   Select[data, !IntervalMemberQ[intv,#]&]]

Here is a quick but slightly large test.

datalist = RandomInteger[11000,100000];
targetlist = Table[{n,n+20}, {n,100,10000,100}];

In[47]:= Timing[resultdata = subsets[datalist,targetlist];]
Out[47]= {14.4878, Null}

In[48]:= Timing[resultdata2 = subsets2[datalist,targetlist];]
Out[48]= {0.179973, Null}

In[49]:= resultdata === resultdata2
Out[49]= True

In[50]:= Length[resultdata2]
Out[50]= 82596

Daniel Lichtblau
Wolfram Research

References:
- More Efficient Method
  - From: blamm64 <blamm64@charter.net>

Prev by Date: Re: More Efficient Method

Next by Date: Re: Re: Question about MeshFunctions (Plot function)

Previous by thread: More Efficient Method

Next by thread: Re: More Efficient Method