Re: Counting number of numbers in a large list between two valus
- To: mathgroup at smc.vnet.net
- Subject: [mg114547] Re: Counting number of numbers in a large list between two valus
- From: Ray Koopman <koopman at sfu.ca>
- Date: Tue, 7 Dec 2010 06:47:25 -0500 (EST)
- References: <idhjda$9f6$1@smc.vnet.net>
On Dec 5, 6:56 pm, Lyle <lgor... at gmail.com> wrote: > Dear Listers, > > I have a large (5-20million) one dimensional list of real numbers and > I want to count the number of entries in the list that lie between 2 > specific values (x1, x2). I need to run the function for a number of > different ranges. > > ie. number of list entries (l), where x1 <= l <= x2 > > I've tried: > > tallydata[{x1_, x2_}] := Count[data, x_ /; x1 <= x <= x2] > > that takes about 3-4 seconds > > and > > tallydata[{x1_, x2_}] := Length[Select[data, x1 <= # <= x2 &]] > > which takes a little bit longer. > > The best I've managed is (this last one might be off by 1 or 2 but > this doesn't really matter to me): > > sorteddata = Sort[data]; > nf = Nearest[sorteddata]; > tallyrange[{x1_, x2_}] := > First[Position[sorteddata, First[nf[x2]]]] - > First[Position[sorteddata, First[nf[x1]]]] > > which takes between 1 and 2 seconds but I was hoping there might be a > faster way to do this? > > Any help would be great! > > Thanks, > Lyle Gordon > > Northwestern University Here are some oldies, plus improvements on them. Note that the fastest two routines that use UnitStep will give wrong answers if the data contain any values that equal max, and that the routines that use Clip will give wrong answers if min <= 0 <= max and the data contain any zeros. data = RandomReal[1.,1*^7]; min = .2; max = .3; Total@UnitStep[(data-min)*(max-data)] //Timing {2.64,999208} UnitStep[data-min].UnitStep[max-data] //Timing {2.3,999208} Total[ UnitStep[data-min]-UnitStep[data-max] ] //Timing {2.23,999208} Total@UnitStep[data-min] - Total@UnitStep[data-max] //Timing {1.73,999208} Total@Unitize@Clip[data,{min,max},{0,0}] //Timing {0.91,999208} SparseArray@Clip[data,{min,max},{0,0}] /. SparseArray[_,_,_,d_] :> d[[2,1,-1]] //Timing {0.77,999208}