RE: grouping and averaging {x,y} pairs of data

*To*: mathgroup at smc.vnet.net*Subject*: [mg37255] RE: [mg37202] grouping and averaging {x,y} pairs of data*From*: "Wolf, Hartmut" <Hartmut.Wolf at t-systems.com>*Date*: Fri, 18 Oct 2002 05:18:11 -0400 (EDT)*Sender*: owner-wri-mathgroup at wolfram.com

>-----Original Message----- >Sent: Wednesday, October 16, 2002 8:26 PM >To: mathgroup at smc.vnet.net >Subject: [mg37255] [mg37202] grouping and averaging {x,y} pairs of data > > >Dear Fellows in MathGroup, > >I have a list of 17,000+ {x,y} pairs of data > > each x value is a positive integer from 1 to 100+ > > each y value is a positive real number > >As a *short* example, let's consider: > > data = {{3,1},{4,3},{3,2},{1,10},{4,2},{1,6},{5,2},{2,5},{7,1}} > >I want to group the data by the x value and report the >arithmetic average >of the y values in each group. > >For the example, i want to report: > > output = {{1,8},{2,5},{3,1.5},{4,2.5},{5,2},{6,0},{7,1}} > >In this example, x=6 does not occur so i report the average y[6] = 0. > >Can anyone suggest a way to do this efficiently?/ > >many thanks >dave > > > >+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >David E. Burmaster, Ph.D. >Alceon Corporation >POBox 382069 (new Box number effective 1 Sep 2001) >Harvard Square Station >Cambridge, MA 02238-2069 (new ZIP code effective 1 Sep 2001) > >Voice 617-864-4300 > >Web http://www.Alceon.com >Email deb at Alceon.com >+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > Dave, my first attempt used the same reasoning as Daniel Lichtblau proposed (and came out only slightly faster than his). However the Sort/Split idea as brought forward by Bob Hanlon and Allan Hayes is much faster; Bobby Treat's version turns out to be slower (than Daniel's and mine). To reach comparable results, I slightly modified Allan's solution (which was fastest): data = Table[{Random[Integer, {1, 98}], Random[]}, {20000}]; (f[x_] = 0; ((f[#1[[1, 1]]] = Plus @@ #1[[All, 2]]/Length[#1]) &) /@ Split[Sort[data], #1[[1]] == #2[[1]] &]; r4 = {#, f[#]} & /@ Range[98];) // Timing {3.045 Second, Null} So I reconsidered that idea and found a solution which is nearly twice as fast: binnedAverage2[data_, max_] := Module[{v, i, ix, ixx, ixxx}, {i, v} = With[{rr = Range[max]}, Transpose[Sort[Join[data, Transpose[{rr, rr - rr}]]]]]; ix = Split[i]; ixx = FoldList[Plus[#1, Length[#2]] &, 0, ix]; ixxx = Transpose[Transpose[Partition[ixx, 2, 1]] + {1, 0}]; Transpose[{First /@ ix, Plus @@ #/Max[Length[#] - 1, 1] &[Take[v, #]] & /@ ixxx}]] (r7 = binnedAverage2[data, 98]); // Timing {1.612 Second, Null} r7 == r4 True -- Hartmut Wolf