Re: breaking up lists into intervals
- To: mathgroup at smc.vnet.net
- Subject: [mg23100] Re: [mg23074] breaking up lists into intervals
- From: BobHanlon at aol.com
- Date: Sun, 16 Apr 2000 00:37:38 -0400 (EDT)
- Sender: owner-wri-mathgroup at wolfram.com
Note that in your definition that the bins are inclusive at each end of the interval. This could result in some data points being included in two bins. First method: makes each pass through all of the data SetAttributes[binnedData1, HoldFirst]; binnedData1[data_, {xmin_, xmax_, dx_}, pstn_Integer:1] := Module[{nbrBins = Ceiling[(xmax - xmin)/dx], k}, Table[Select[data, (dx >= (xmin + k*dx - #[[pstn]]) > 0) &], {k, nbrBins}]] /; xmax > xmin && dx > 0 Second method: eliminates selected data from subsequent passes and stops when all data is binned SetAttributes[binnedData2, HoldFirst]; binnedData2[data_, {xmin_, xmax_, dx_}, pstn_Integer:1] := Module[{sortedData = Sort[data, #1[[pstn]] < #2[[pstn]] &], nbrBins = Ceiling[(xmax - xmin)/dx], k = 1, lr, result = {}}, While[sortedData != {} && k <= nbrBins, result = Append[result, lr = Select[sortedData, (dx >= (xmin + k*dx - #[[pstn]]) > 0) &]]; sortedData = Drop[sortedData, Length[lr]]; k++]; result] /; xmax > xmin && dx > 0 Test data: xmin = 0; xmax = 100; dx = 5; data = Table[{(xmax - xmin)*Random[] + xmin, Random[], Random[]}, {nbrDataPts = 100}]; Since the second method needs to sort the data, the results are not necessarily in the same order unless either the input or output of the first method is sorted binnedData1[ Sort[data], {xmin, xmax, dx}] == (Sort[#] & /@ binnedData1[data, {xmin, xmax, dx}]) == binnedData2[data, {xmin, xmax, dx}] True Checking the timing Length[Flatten[binnedData1[data, {xmin, xmax, dx}], 1]] == nbrDataPts // Timing {0.21666666666715173*Second, True} Length[Flatten[binnedData2[data, {xmin, xmax, dx}], 1]] == nbrDataPts // Timing {0.1499999999996362*Second, True} The second method is faster eventhough it spends time initially sorting the data Bob Hanlon In a message dated 4/15/2000 3:35:42 AM, Matt.Johnson at autolivasp.com writes: >I have many large datasets of {x,y,z} data that I wish to break into small >data >sets based on the value of x. For example, the x value ranges from 0 to >100 and >I want ot break up the data into 20 groups, from 0-5, 5-10, 10-15, etc. > There >will be an unequal number of data points in each interval. I have written >a >routine based on several Do loops to do this and it works satisfactorily. >However, I would think that there is a way to eliminate from the data set >the >points that have already been placed in their appropriate intervals, or >a >routine that would "place" the point in the appropriate group, only having >to go >through the datasets once. Either of these options would speed up the >process. >Currently the routine goes through each complete dataset as many times >as there >are intervals created. Here is the current code: > >Do[Do[Do[ > If[ i-0.5 di<=dataset[j][[k,1]]<=i+0.5 di, > AppendTo[group[j,i],dataset[j][[k]] ]], > {k, Length[dataset[j]]}], > {i, imin, imax, di}], {j,jmax}] > >There are j datasets with k points in each dataset. i serves as the index >for >the intervals, according to the x value, with an interval size of di. >It creates (imax-imin)/di intervals in each dataset. >