MathGroup Archive 2010

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Binning a dependent variable

  • To: mathgroup at smc.vnet.net
  • Subject: [mg111973] Re: Binning a dependent variable
  • From: Bill Rowe <readnews at sbcglobal.net>
  • Date: Sun, 22 Aug 2010 08:11:59 -0400 (EDT)

On 8/21/10 at 4:18 AM, max_kip at hotmail.com (northern_flicker) wrote:

>I have data with x and y values, i.e., temperature and a variable
>dependent on the temperature, e.g.,

>{{30, 0.0019}, {31, 0.0024}, {32,
>0.0026}...{998,0.000005},{999,0.000004},{1000,0.000003}}, with, for
>example, 3000 measurements in total.

>I would like to:

>1) Bin the temperature data (x) into bins of any desired width, plus
>2) specify the temperature range over which to bin the data and
>3) bin the corresponding dependent variable (y), so that I can
>calculate the corresponding mean y for the temperature bin, ultimately
>resulting in:

>{{31,0.00793}...{999,0.0000004}},

>where x is now the mean temperature and y is the mean dependent
>variable for that temperature bin.

>I tried to manipulate BinLists and BinCounts to solve this problem,
>but ran into difficulties.

Here is one approach

First to generate some data

data = Transpose[{Sort@RandomInteger[{20, 40}, 1000],
     RandomReal[NormalDistribution[], 1000] + Range[1000]}];

Here, I've used RandomInteger over a fairly narrow range to
ensure there would be repeated x values in the data list. I've
sorted the x values so that there would be an approximate linear
dependence o the y values on the x values.

Now, I can group the data using GatherBy and apply whatever
statistics I like to each group. For example

In[20]:= Median /@ GatherBy[data, Round[First@#, 2] &]

Out[20]= {{21., 42.19418297716063}, {22., 103.06772978507512},
    {24., 202.709609270177}, {26., 308.91049052795177},
    {28., 409.63095885047176}, {30., 508.58960769955314},
    {32., 607.3781326212272}, {34., 701.0710633400882},
    {36., 798.2098786948368}, {38., 895.8257388028434},
    {40., 958.28139756248}}

Or using a different bin width

In[22]:= Median /@ GatherBy[data, Round[First@#, 5] &]

Out[22]= {{21., 61.599426791699685}, {25., 256.65805481161476},
    {30., 509.65963901231527}, {35., 754.1388429566833},
    {39., 939.7722174247823}}

Strictly speaking, using Round[number, a] as the grouping
criteria is not the same as binning the data with a fixed bin
width. This can be seen by doing:

In[24]:= Tally[Round[Range[10], 2]]

Out[24]= {{0, 1}, {2, 1}, {4, 3}, {6, 1}, {8, 3}, {10, 1}}

As you can see, using Round to group a set of increasing
integers doesn't give uniform groupings. If it is essential to
have uniform widths then Round[First@#, 2] in the code above
could be replaced with Floor[First#]/2]. As you can see by:

In[25]:= Tally[Floor[Range[10]/2]]

Out[25]= {{0, 1}, {1, 2}, {2, 2}, {3, 2}, {4, 2}, {5, 1}}

uniform widths are create by using Floor. Which is better will
depend on exactly what it is you ultimately want to accomplish.



  • Prev by Date: Re: Binning a dependent variable (CORRECTION)
  • Next by Date: Re: Launch kernel and packages from Mac OS's launchd utility
  • Previous by thread: Re: Binning a dependent variable
  • Next by thread: Re: Binning a dependent variable