RE: grouping and averaging {x,y} pairs of data
- To: mathgroup at smc.vnet.net
- Subject: [mg37255] RE: [mg37202] grouping and averaging {x,y} pairs of data
- From: "Wolf, Hartmut" <Hartmut.Wolf at t-systems.com>
- Date: Fri, 18 Oct 2002 05:18:11 -0400 (EDT)
- Sender: owner-wri-mathgroup at wolfram.com
>-----Original Message-----
>Sent: Wednesday, October 16, 2002 8:26 PM
>To: mathgroup at smc.vnet.net
>Subject: [mg37255] [mg37202] grouping and averaging {x,y} pairs of data
>
>
>Dear Fellows in MathGroup,
>
>I have a list of 17,000+ {x,y} pairs of data
>
> each x value is a positive integer from 1 to 100+
>
> each y value is a positive real number
>
>As a *short* example, let's consider:
>
> data = {{3,1},{4,3},{3,2},{1,10},{4,2},{1,6},{5,2},{2,5},{7,1}}
>
>I want to group the data by the x value and report the
>arithmetic average
>of the y values in each group.
>
>For the example, i want to report:
>
> output = {{1,8},{2,5},{3,1.5},{4,2.5},{5,2},{6,0},{7,1}}
>
>In this example, x=6 does not occur so i report the average y[6] = 0.
>
>Can anyone suggest a way to do this efficiently?/
>
>many thanks
>dave
>
>
>
>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>David E. Burmaster, Ph.D.
>Alceon Corporation
>POBox 382069 (new Box number effective 1 Sep 2001)
>Harvard Square Station
>Cambridge, MA 02238-2069 (new ZIP code effective 1 Sep 2001)
>
>Voice 617-864-4300
>
>Web http://www.Alceon.com
>Email deb at Alceon.com
>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
Dave,
my first attempt used the same reasoning as Daniel Lichtblau proposed (and
came out only slightly faster than his). However the Sort/Split idea as
brought forward by Bob Hanlon and Allan Hayes is much faster; Bobby Treat's
version turns out to be slower (than Daniel's and mine). To reach comparable
results, I slightly modified Allan's solution (which was fastest):
data = Table[{Random[Integer, {1, 98}], Random[]}, {20000}];
(f[x_] = 0;
((f[#1[[1, 1]]] = Plus @@ #1[[All, 2]]/Length[#1]) &) /@
Split[Sort[data], #1[[1]] == #2[[1]] &];
r4 = {#, f[#]} & /@ Range[98];) // Timing
{3.045 Second, Null}
So I reconsidered that idea and found a solution which is nearly twice as
fast:
binnedAverage2[data_, max_] := Module[{v, i, ix, ixx, ixxx},
{i, v} =
With[{rr = Range[max]},
Transpose[Sort[Join[data, Transpose[{rr, rr - rr}]]]]];
ix = Split[i];
ixx = FoldList[Plus[#1, Length[#2]] &, 0, ix];
ixxx = Transpose[Transpose[Partition[ixx, 2, 1]] + {1, 0}];
Transpose[{First /@ ix,
Plus @@ #/Max[Length[#] - 1, 1] &[Take[v, #]] & /@ ixxx}]]
(r7 = binnedAverage2[data, 98]); // Timing
{1.612 Second, Null}
r7 == r4
True
--
Hartmut Wolf