MathGroup Archive: April 2000 [00255]

[Date Index] [Thread Index] [Author Index]

Re: breaking up lists into intervals

To: mathgroup at smc.vnet.net
Subject: [mg23100] Re: [mg23074] breaking up lists into intervals
From: BobHanlon at aol.com
Date: Sun, 16 Apr 2000 00:37:38 -0400 (EDT)
Sender: owner-wri-mathgroup at wolfram.com

Note that in your definition that the bins are inclusive at each end of the 
interval. This could result in some data points being included in two bins.

First method: makes each pass through all of the data

SetAttributes[binnedData1, HoldFirst];

binnedData1[data_, {xmin_, xmax_, dx_}, pstn_Integer:1] := 
  Module[{nbrBins =  Ceiling[(xmax - xmin)/dx], k}, 
      Table[Select[data, (dx >= (xmin + k*dx - #[[pstn]]) > 0) &], {k, 
          nbrBins}]] /; xmax > xmin && dx > 0

Second method: eliminates selected data from subsequent passes and stops when 
all data is binned

SetAttributes[binnedData2, HoldFirst];

binnedData2[data_, {xmin_, xmax_, dx_}, pstn_Integer:1] := 
  Module[{sortedData = Sort[data, #1[[pstn]] < #2[[pstn]] &], 
        nbrBins =  Ceiling[(xmax - xmin)/dx], k = 1, lr, result = {}}, 
      While[sortedData != {} && k <= nbrBins, 
        result = Append[result, lr = Select[sortedData, 
                (dx >= (xmin + k*dx - #[[pstn]]) > 0) &]]; 
        sortedData = Drop[sortedData, Length[lr]]; k++]; 
      result] /; xmax > xmin && dx > 0

Test data:

xmin = 0; xmax = 100; dx = 5;

data = Table[{(xmax - xmin)*Random[] + xmin, Random[], 
        Random[]}, {nbrDataPts = 100}];

Since the second method needs to sort the data, the results are not 
necessarily in the same order unless either the input or output of the first 
method is sorted

binnedData1[
    Sort[data], {xmin, xmax, dx}] == (Sort[#] & /@ 
      binnedData1[data, {xmin, xmax, dx}]) == 
  binnedData2[data, {xmin, xmax, dx}]

True

Checking the timing

Length[Flatten[binnedData1[data, {xmin, xmax, dx}], 1]] == 
    nbrDataPts // Timing

{0.21666666666715173*Second, True}

Length[Flatten[binnedData2[data, {xmin, xmax, dx}], 1]] == 
    nbrDataPts // Timing

{0.1499999999996362*Second, True}

The second method is faster eventhough it spends time initially sorting the 
data

Bob Hanlon

In a message dated 4/15/2000 3:35:42 AM, Matt.Johnson at autolivasp.com writes:

>I have many large datasets of {x,y,z} data that I wish to break into small
>data
>sets based on the value of x.  For example, the x value ranges from 0 to
>100 and
>I want ot break up the data into 20 groups, from 0-5, 5-10, 10-15, etc.
> There
>will be an unequal number of data points in each interval.  I have written
>a
>routine based on several Do loops to do this and it works satisfactorily.
>However, I would think that there is a way to eliminate from the data set
>the
>points that have already been placed in their appropriate intervals, or
>a
>routine that would "place" the point in the appropriate group, only having
>to go
>through the datasets once.  Either of these options would speed up the
>process.
>Currently the routine goes through each complete dataset as many times
>as there
>are intervals created.  Here is the current code:
>
>Do[Do[Do[
>     If[ i-0.5 di<=dataset[j][[k,1]]<=i+0.5 di,
>     AppendTo[group[j,i],dataset[j][[k]] ]],
>     {k, Length[dataset[j]]}],
>     {i, imin, imax, di}], {j,jmax}]
>
>There are j datasets with k points in each dataset.  i serves as the index
>for
>the intervals, according to the x value, with an interval size of di.
>It creates (imax-imin)/di intervals in each dataset.
>

Prev by Date: Re: Keyboard shortcuts

Next by Date: Re: Keyboard shortcuts

Previous by thread: Re: breaking up lists into intervals

Next by thread: RE: breaking up lists into intervals