MathGroup Archive 2010

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Counting number of numbers in a large list between two valus

  • To: mathgroup at smc.vnet.net
  • Subject: [mg114515] Re: Counting number of numbers in a large list between two valus
  • From: Leonid Shifrin <lshifr at gmail.com>
  • Date: Mon, 6 Dec 2010 06:13:27 -0500 (EST)

Hi Lyle,

As far as I can tell, both your solutions are incorrect, besides being
indeed not optimal.
The first one is correct only for an ordered list, for which you can use
binary search to
have a much faster one. I fail to see how the second one can be correct
(except possibly
again for the case of ordered original list of data) , since you look at the
distance in the
sorted list, and make no effort to relate that to the distance between
elements in the
original list.

Here is a different solution (based on a couple of functions posted a while
ago by
Norbert Pozar), which is using SparseArray and is both general and much
faster:

extractPositionFromSparseArray[
  HoldPattern[SparseArray[u___]]] := {u}[[4, 2, 2]]

positionExtr[x_List, n_] :=
 Flatten@extractPositionFromSparseArray[
   SparseArray[Unitize[x - n], Automatic, 1]]

getDistanceBetweenElements[list_, el1_, el2_] :=
  With[{p1 = positionExtr[list, el1], p2 = positionExtr[list, el2]},
   First@p2 - First@p1 /; Length[p1] == Length[p2] == 1];

It will only return the result if both elements are present in the list
exactly once,
which is a condition for your problem to be well-formulated. Here we test it
on
a list of 20 million random numbers:

tst = RandomReal[{1, 1000}, 20000000];

x1 = tst[[10000]];
x2 = tst[[10000000]];

In[30]:= getDistanceBetweenElements[tst, x1, x2] // Timing

Out[30]= {0.578, 9990000}

Hope this helps.

Regards,
Leonid



On Mon, Dec 6, 2010 at 5:57 AM, Lyle <lgordon at gmail.com> wrote:

> Dear Listers,
>
> I have a large (5-20million) one dimensional list of real numbers and
> I want to count the number of entries in the list that lie between 2
> specific values (x1, x2). I need to run the function for a number of
> different ranges.
>
> ie. number of list entries (l), where x1 <= l <= x2
>
> I've tried:
>
> tallydata[{x1_, x2_}] := Count[data, x_ /; x1 <= x <= x2]
>
> that takes about 3-4 seconds
>
> and
>
> tallydata[{x1_, x2_}] := Length[Select[data, x1 <= # <= x2 &]]
>
> which takes a little bit longer.
>
> The best I've managed is (this last one might be off by 1 or 2 but
> this doesn't really matter to me):
>
> sorteddata = Sort[data];
> nf = Nearest[sorteddata];
> tallyrange[{x1_, x2_}] :=
>  First[Position[sorteddata, First[nf[x2]]]] -
>  First[Position[sorteddata, First[nf[x1]]]]
>
> which takes between 1 and 2 seconds but I was hoping there might be a
> faster way to do this?
>
> Any help would be great!
>
> Thanks,
> Lyle Gordon
>
> Northwestern University
>
>


  • Prev by Date: Re: Re-virginating Manipulates?
  • Next by Date: Re: Why are my 3D plots blue?
  • Previous by thread: Re: Counting number of numbers in a large list between two valus
  • Next by thread: Re: Counting number of numbers in a large list between two valus