Re: Counting number of numbers in a large list between two valus
- To: mathgroup at smc.vnet.net
- Subject: [mg114515] Re: Counting number of numbers in a large list between two valus
- From: Leonid Shifrin <lshifr at gmail.com>
- Date: Mon, 6 Dec 2010 06:13:27 -0500 (EST)
Hi Lyle, As far as I can tell, both your solutions are incorrect, besides being indeed not optimal. The first one is correct only for an ordered list, for which you can use binary search to have a much faster one. I fail to see how the second one can be correct (except possibly again for the case of ordered original list of data) , since you look at the distance in the sorted list, and make no effort to relate that to the distance between elements in the original list. Here is a different solution (based on a couple of functions posted a while ago by Norbert Pozar), which is using SparseArray and is both general and much faster: extractPositionFromSparseArray[ HoldPattern[SparseArray[u___]]] := {u}[[4, 2, 2]] positionExtr[x_List, n_] := Flatten@extractPositionFromSparseArray[ SparseArray[Unitize[x - n], Automatic, 1]] getDistanceBetweenElements[list_, el1_, el2_] := With[{p1 = positionExtr[list, el1], p2 = positionExtr[list, el2]}, First@p2 - First@p1 /; Length[p1] == Length[p2] == 1]; It will only return the result if both elements are present in the list exactly once, which is a condition for your problem to be well-formulated. Here we test it on a list of 20 million random numbers: tst = RandomReal[{1, 1000}, 20000000]; x1 = tst[[10000]]; x2 = tst[[10000000]]; In[30]:= getDistanceBetweenElements[tst, x1, x2] // Timing Out[30]= {0.578, 9990000} Hope this helps. Regards, Leonid On Mon, Dec 6, 2010 at 5:57 AM, Lyle <lgordon at gmail.com> wrote: > Dear Listers, > > I have a large (5-20million) one dimensional list of real numbers and > I want to count the number of entries in the list that lie between 2 > specific values (x1, x2). I need to run the function for a number of > different ranges. > > ie. number of list entries (l), where x1 <= l <= x2 > > I've tried: > > tallydata[{x1_, x2_}] := Count[data, x_ /; x1 <= x <= x2] > > that takes about 3-4 seconds > > and > > tallydata[{x1_, x2_}] := Length[Select[data, x1 <= # <= x2 &]] > > which takes a little bit longer. > > The best I've managed is (this last one might be off by 1 or 2 but > this doesn't really matter to me): > > sorteddata = Sort[data]; > nf = Nearest[sorteddata]; > tallyrange[{x1_, x2_}] := > First[Position[sorteddata, First[nf[x2]]]] - > First[Position[sorteddata, First[nf[x1]]]] > > which takes between 1 and 2 seconds but I was hoping there might be a > faster way to do this? > > Any help would be great! > > Thanks, > Lyle Gordon > > Northwestern University > >