[Date Index]
[Thread Index]
[Author Index]
Re: Binning a dependent variable
*To*: mathgroup at smc.vnet.net
*Subject*: [mg111973] Re: Binning a dependent variable
*From*: Bill Rowe <readnews at sbcglobal.net>
*Date*: Sun, 22 Aug 2010 08:11:59 -0400 (EDT)
On 8/21/10 at 4:18 AM, max_kip at hotmail.com (northern_flicker) wrote:
>I have data with x and y values, i.e., temperature and a variable
>dependent on the temperature, e.g.,
>{{30, 0.0019}, {31, 0.0024}, {32,
>0.0026}...{998,0.000005},{999,0.000004},{1000,0.000003}}, with, for
>example, 3000 measurements in total.
>I would like to:
>1) Bin the temperature data (x) into bins of any desired width, plus
>2) specify the temperature range over which to bin the data and
>3) bin the corresponding dependent variable (y), so that I can
>calculate the corresponding mean y for the temperature bin, ultimately
>resulting in:
>{{31,0.00793}...{999,0.0000004}},
>where x is now the mean temperature and y is the mean dependent
>variable for that temperature bin.
>I tried to manipulate BinLists and BinCounts to solve this problem,
>but ran into difficulties.
Here is one approach
First to generate some data
data = Transpose[{Sort@RandomInteger[{20, 40}, 1000],
RandomReal[NormalDistribution[], 1000] + Range[1000]}];
Here, I've used RandomInteger over a fairly narrow range to
ensure there would be repeated x values in the data list. I've
sorted the x values so that there would be an approximate linear
dependence o the y values on the x values.
Now, I can group the data using GatherBy and apply whatever
statistics I like to each group. For example
In[20]:= Median /@ GatherBy[data, Round[First@#, 2] &]
Out[20]= {{21., 42.19418297716063}, {22., 103.06772978507512},
{24., 202.709609270177}, {26., 308.91049052795177},
{28., 409.63095885047176}, {30., 508.58960769955314},
{32., 607.3781326212272}, {34., 701.0710633400882},
{36., 798.2098786948368}, {38., 895.8257388028434},
{40., 958.28139756248}}
Or using a different bin width
In[22]:= Median /@ GatherBy[data, Round[First@#, 5] &]
Out[22]= {{21., 61.599426791699685}, {25., 256.65805481161476},
{30., 509.65963901231527}, {35., 754.1388429566833},
{39., 939.7722174247823}}
Strictly speaking, using Round[number, a] as the grouping
criteria is not the same as binning the data with a fixed bin
width. This can be seen by doing:
In[24]:= Tally[Round[Range[10], 2]]
Out[24]= {{0, 1}, {2, 1}, {4, 3}, {6, 1}, {8, 3}, {10, 1}}
As you can see, using Round to group a set of increasing
integers doesn't give uniform groupings. If it is essential to
have uniform widths then Round[First@#, 2] in the code above
could be replaced with Floor[First#]/2]. As you can see by:
In[25]:= Tally[Floor[Range[10]/2]]
Out[25]= {{0, 1}, {1, 2}, {2, 2}, {3, 2}, {4, 2}, {5, 1}}
uniform widths are create by using Floor. Which is better will
depend on exactly what it is you ultimately want to accomplish.
Prev by Date:
**Re: Binning a dependent variable (CORRECTION)**
Next by Date:
**Re: Launch kernel and packages from Mac OS's launchd utility**
Previous by thread:
**Re: Binning a dependent variable**
Next by thread:
**Re: Binning a dependent variable**
| |