MathGroup Archive 2002

[Date Index] [Thread Index] [Author Index]

Search the Archive

RE: grouping and averaging {x,y} pairs of data

  • To: mathgroup at smc.vnet.net
  • Subject: [mg37255] RE: [mg37202] grouping and averaging {x,y} pairs of data
  • From: "Wolf, Hartmut" <Hartmut.Wolf at t-systems.com>
  • Date: Fri, 18 Oct 2002 05:18:11 -0400 (EDT)
  • Sender: owner-wri-mathgroup at wolfram.com

>-----Original Message-----
>Sent: Wednesday, October 16, 2002 8:26 PM
>To: mathgroup at smc.vnet.net
>Subject: [mg37255] [mg37202] grouping and averaging {x,y} pairs of data
>
>
>Dear Fellows in MathGroup,
>
>I have a list of 17,000+ {x,y} pairs of data
>
>	each x value is a positive integer from 1 to 100+
>
>	each y value is a positive real number
>
>As a *short* example, let's consider:
>
> data = {{3,1},{4,3},{3,2},{1,10},{4,2},{1,6},{5,2},{2,5},{7,1}}
>
>I want to group the data by the x value and report the 
>arithmetic average
>of the y values in each group.
>
>For the example, i want to report:
>
> output = {{1,8},{2,5},{3,1.5},{4,2.5},{5,2},{6,0},{7,1}}
>
>In this example, x=6 does not occur so i report the average y[6] = 0.
>
>Can anyone suggest a way to do this efficiently?/
>
>many thanks
>dave
>
>
>
>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>David E. Burmaster, Ph.D.
>Alceon Corporation
>POBox 382069                 (new Box number effective 1 Sep 2001)
>Harvard Square Station
>Cambridge, MA 02238-2069     (new ZIP code effective 1 Sep 2001)
>
>Voice	617-864-4300
>
>Web	http://www.Alceon.com
>Email	deb at Alceon.com
>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>

Dave,

my first attempt used the same reasoning as Daniel Lichtblau proposed (and
came out only slightly faster than his). However the Sort/Split idea as
brought forward by Bob Hanlon and Allan Hayes is much faster; Bobby Treat's
version turns out to be slower (than Daniel's and mine). To reach comparable
results, I slightly modified Allan's solution (which was fastest):


data = Table[{Random[Integer, {1, 98}], Random[]}, {20000}];


(f[x_] = 0;
((f[#1[[1, 1]]] = Plus @@ #1[[All, 2]]/Length[#1]) &) /@ 
      Split[Sort[data], #1[[1]] == #2[[1]] &];
r4 = {#, f[#]} & /@ Range[98];) // Timing

{3.045 Second, Null}


So I reconsidered that idea and found a solution which is nearly twice as
fast:

binnedAverage2[data_, max_] := Module[{v, i, ix, ixx, ixxx},
    {i, v} = 
      With[{rr = Range[max]}, 
        Transpose[Sort[Join[data, Transpose[{rr, rr - rr}]]]]];
    ix = Split[i];
    ixx = FoldList[Plus[#1, Length[#2]] &, 0, ix];
    ixxx = Transpose[Transpose[Partition[ixx, 2, 1]] + {1, 0}];
    Transpose[{First /@ ix, 
        Plus @@ #/Max[Length[#] - 1, 1] &[Take[v, #]] & /@ ixxx}]]

(r7 = binnedAverage2[data, 98]); // Timing
{1.612 Second, Null}

r7 == r4
True

--
Hartmut Wolf



  • Prev by Date: Re: How to speed up this calculation?
  • Next by Date: Again, How to speed up this calculation? and more
  • Previous by thread: Re: grouping and averaging {x,y} pairs of data
  • Next by thread: RE: RE: grouping and averaging {x,y} pairs of data