MathGroup Archive 2000

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: breaking up lists into intervals

  • To: mathgroup at
  • Subject: [mg23100] Re: [mg23074] breaking up lists into intervals
  • From: BobHanlon at
  • Date: Sun, 16 Apr 2000 00:37:38 -0400 (EDT)
  • Sender: owner-wri-mathgroup at

Note that in your definition that the bins are inclusive at each end of the 
interval. This could result in some data points being included in two bins.

First method: makes each pass through all of the data

SetAttributes[binnedData1, HoldFirst];

binnedData1[data_, {xmin_, xmax_, dx_}, pstn_Integer:1] := 
  Module[{nbrBins =  Ceiling[(xmax - xmin)/dx], k}, 
      Table[Select[data, (dx >= (xmin + k*dx - #[[pstn]]) > 0) &], {k, 
          nbrBins}]] /; xmax > xmin && dx > 0

Second method: eliminates selected data from subsequent passes and stops when 
all data is binned

SetAttributes[binnedData2, HoldFirst];

binnedData2[data_, {xmin_, xmax_, dx_}, pstn_Integer:1] := 
  Module[{sortedData = Sort[data, #1[[pstn]] < #2[[pstn]] &], 
        nbrBins =  Ceiling[(xmax - xmin)/dx], k = 1, lr, result = {}}, 
      While[sortedData != {} && k <= nbrBins, 
        result = Append[result, lr = Select[sortedData, 
                (dx >= (xmin + k*dx - #[[pstn]]) > 0) &]]; 
        sortedData = Drop[sortedData, Length[lr]]; k++]; 
      result] /; xmax > xmin && dx > 0

Test data:

xmin = 0; xmax = 100; dx = 5;

data = Table[{(xmax - xmin)*Random[] + xmin, Random[], 
        Random[]}, {nbrDataPts = 100}];

Since the second method needs to sort the data, the results are not 
necessarily in the same order unless either the input or output of the first 
method is sorted

    Sort[data], {xmin, xmax, dx}] == (Sort[#] & /@ 
      binnedData1[data, {xmin, xmax, dx}]) == 
  binnedData2[data, {xmin, xmax, dx}]


Checking the timing

Length[Flatten[binnedData1[data, {xmin, xmax, dx}], 1]] == 
    nbrDataPts // Timing

{0.21666666666715173*Second, True}

Length[Flatten[binnedData2[data, {xmin, xmax, dx}], 1]] == 
    nbrDataPts // Timing

{0.1499999999996362*Second, True}

The second method is faster eventhough it spends time initially sorting the 

Bob Hanlon

In a message dated 4/15/2000 3:35:42 AM, Matt.Johnson at writes:

>I have many large datasets of {x,y,z} data that I wish to break into small
>sets based on the value of x.  For example, the x value ranges from 0 to
>100 and
>I want ot break up the data into 20 groups, from 0-5, 5-10, 10-15, etc.
> There
>will be an unequal number of data points in each interval.  I have written
>routine based on several Do loops to do this and it works satisfactorily.
>However, I would think that there is a way to eliminate from the data set
>points that have already been placed in their appropriate intervals, or
>routine that would "place" the point in the appropriate group, only having
>to go
>through the datasets once.  Either of these options would speed up the
>Currently the routine goes through each complete dataset as many times
>as there
>are intervals created.  Here is the current code:
>     If[ i-0.5 di<=dataset[j][[k,1]]<=i+0.5 di,
>     AppendTo[group[j,i],dataset[j][[k]] ]],
>     {k, Length[dataset[j]]}],
>     {i, imin, imax, di}], {j,jmax}]
>There are j datasets with k points in each dataset.  i serves as the index
>the intervals, according to the x value, with an interval size of di.
>It creates (imax-imin)/di intervals in each dataset.

  • Prev by Date: Re: Keyboard shortcuts
  • Next by Date: Re: Keyboard shortcuts
  • Previous by thread: Re: breaking up lists into intervals
  • Next by thread: RE: breaking up lists into intervals