David,

You will probably get a lot of answers for this. Here is my entry.

data = {{3, 1}, {4, 3}, {3, 2}, {1, 10}, {4, 2}, {1, 6}, {5, 2}, {2, 5},
    {7, 1}};

First I will show it step-by-step.

nmax = 10;
Union[Join[data, Table[{i, 0}, {i, 1, nmax}]]]
Split[%, #1[[1]] == #2[[1]] & ]
Map[Last, %, {2}]
(Plus @@ #1/Length[#1] & ) /@ %
Transpose[{Range[nmax], %}]

giving

{{1, 0}, {1, 6}, {1, 10}, {2, 0}, {2, 5}, {3, 0}, {3, 1}, {3, 2}, {4, 0},
  {4, 2}, {4, 3}, {5, 0}, {5, 2}, {6, 0}, {7, 0}, {7, 1}, {8, 0}, {9, 0},
  {10, 0}}

{{{1, 0}, {1, 6}, {1, 10}}, {{2, 0}, {2, 5}}, {{3, 0}, {3, 1}, {3, 2}},
  {{4, 0}, {4, 2}, {4, 3}}, {{5, 0}, {5, 2}}, {{6, 0}}, {{7, 0}, {7, 1}},
  {{8, 0}}, {{9, 0}}, {{10, 0}}}

{{0, 6, 10}, {0, 5}, {0, 1, 2}, {0, 2, 3}, {0, 2}, {0}, {0, 1}, {0}, {0},
  {0}}

{16/3, 5/2, 1, 5/3, 1, 0, 1/2, 0, 0, 0}

{{1, 16/3}, {2, 5/2}, {3, 1}, {4, 5/3}, {5, 1}, {6, 0}, {7, 1/2}, {8, 0},
  {9, 0}, {10, 0}}

This wraps it into one statement.

nmax = 10;
Transpose[{Range[nmax],
    (Plus @@ #1/Length[#1] & ) /@
      Map[Last,
        Split[Union[Join[data, Table[{i, 0}, {i, 1, nmax}]]],
          #1[[1]] == #2[[1]] & ], {2}]}]

{{1, 16/3}, {2, 5/2}, {3, 1}, {4, 5/3}, {5, 1}, {6, 0}, {7, 1/2}, {8, 0},
  {9, 0}, {10, 0}}

This times a case of 20000 pairs on an 800MHz machine.

data2 = Table[{Random[Integer, {1, 100}], Random[Real, {0, 5}]}, {20000}];
nmax = 100;
data = data2;
Timing[Transpose[{Range[nmax],
      (Plus @@ #1/Length[#1] & ) /@
        Map[Last,
          Split[Union[Join[data, Table[{i, 0}, {i, 1, nmax}]]],
            #1[[1]] == #2[[1]] & ], {2}]}];
  ]

{0.55 Second, Null}

David Park
djmp at earthlink.net
http://home.earthlink.net/~djmp/


From: David E. Burmaster [mailto:deb at alceon.com]
To: mathgroup at smc.vnet.net

Dear Fellows in MathGroup,

I have a list of 17,000+ {x,y} pairs of data
  each x value is a positive integer from 1 to 100+
  each y value is a positive real number

As a *short* example, let's consider:

data = {{3,1},{4,3},{3,2},{1,10},{4,2},{1,6},{5,2},{2,5},{7,1}}

I want to group the data by the x value and report the arithmetic average of
the y values in each group.

For the example, i want to report:

output = {{1,8},{2,5},{3,1.5},{4,2.5},{5,2},{6,0},{7,1}}

In this example, x=6 does not occur so i report the average y[6] = 0.

Can anyone suggest a way to do this efficiently?/

many thanks

dave