Re: Binning a dependent variable
- To: mathgroup at smc.vnet.net
- Subject: [mg111973] Re: Binning a dependent variable
- From: Bill Rowe <readnews at sbcglobal.net>
- Date: Sun, 22 Aug 2010 08:11:59 -0400 (EDT)
On 8/21/10 at 4:18 AM, max_kip at hotmail.com (northern_flicker) wrote: >I have data with x and y values, i.e., temperature and a variable >dependent on the temperature, e.g., >{{30, 0.0019}, {31, 0.0024}, {32, >0.0026}...{998,0.000005},{999,0.000004},{1000,0.000003}}, with, for >example, 3000 measurements in total. >I would like to: >1) Bin the temperature data (x) into bins of any desired width, plus >2) specify the temperature range over which to bin the data and >3) bin the corresponding dependent variable (y), so that I can >calculate the corresponding mean y for the temperature bin, ultimately >resulting in: >{{31,0.00793}...{999,0.0000004}}, >where x is now the mean temperature and y is the mean dependent >variable for that temperature bin. >I tried to manipulate BinLists and BinCounts to solve this problem, >but ran into difficulties. Here is one approach First to generate some data data = Transpose[{Sort@RandomInteger[{20, 40}, 1000], RandomReal[NormalDistribution[], 1000] + Range[1000]}]; Here, I've used RandomInteger over a fairly narrow range to ensure there would be repeated x values in the data list. I've sorted the x values so that there would be an approximate linear dependence o the y values on the x values. Now, I can group the data using GatherBy and apply whatever statistics I like to each group. For example In[20]:= Median /@ GatherBy[data, Round[First@#, 2] &] Out[20]= {{21., 42.19418297716063}, {22., 103.06772978507512}, {24., 202.709609270177}, {26., 308.91049052795177}, {28., 409.63095885047176}, {30., 508.58960769955314}, {32., 607.3781326212272}, {34., 701.0710633400882}, {36., 798.2098786948368}, {38., 895.8257388028434}, {40., 958.28139756248}} Or using a different bin width In[22]:= Median /@ GatherBy[data, Round[First@#, 5] &] Out[22]= {{21., 61.599426791699685}, {25., 256.65805481161476}, {30., 509.65963901231527}, {35., 754.1388429566833}, {39., 939.7722174247823}} Strictly speaking, using Round[number, a] as the grouping criteria is not the same as binning the data with a fixed bin width. This can be seen by doing: In[24]:= Tally[Round[Range[10], 2]] Out[24]= {{0, 1}, {2, 1}, {4, 3}, {6, 1}, {8, 3}, {10, 1}} As you can see, using Round to group a set of increasing integers doesn't give uniform groupings. If it is essential to have uniform widths then Round[First@#, 2] in the code above could be replaced with Floor[First#]/2]. As you can see by: In[25]:= Tally[Floor[Range[10]/2]] Out[25]= {{0, 1}, {1, 2}, {2, 2}, {3, 2}, {4, 2}, {5, 1}} uniform widths are create by using Floor. Which is better will depend on exactly what it is you ultimately want to accomplish.