Re: Mathematica calculates RSquared wrongly?
- To: mathgroup at smc.vnet.net
- Subject: [mg112798] Re: Mathematica calculates RSquared wrongly?
- From: Darren Glosemeyer <darreng at wolfram.com>
- Date: Fri, 1 Oct 2010 05:40:08 -0400 (EDT)
On 9/30/2010 9:55 AM, Darren Glosemeyer wrote: > On 9/30/2010 3:53 AM, Ray Koopman wrote: >> On Sep 29, 7:48 am, Darren Glosemeyer<darreng at wolfram.com> wrote: >>> On 9/29/2010 3:15 AM, Ray Koopman wrote: >>>> On Sep 28, 3:09 am, Darren Glosemeyer<darreng at wolfram.com> wrote: >>>>> On 9/27/2010 4:47 AM, Lawrence Teo wrote: >>>>>> [...] >>>>>> nlm = NonlinearModelFit[sbbBN, a*x^2 + b*x + c, {a, b, c}, x] >>>>>> nlm["RSquared"] >>>>>> >>>>>> The RSquared by Mathematica is 0.963173 >>>>>> Meanwhile, Excel and manual hand calculation show that R^2 should >>>>>> be equal to 0.7622. >>>>>> >>>>>> Is Mathematica wrong? Thanks! >>>>> This is as designed. For nonlinear models, the corrected (i.e. with >>>>> the mean subtracted out) sum of squares is sometimes used. This is >>>>> consistent with comparing to a constant model, but most nonlinear >>>>> models do not include a constant in an additive way. For this reason, >>>>> NonlinearModelFit uses the uncorrected (i.e. without subtracting out >>>>> the mean) sum of squares. >>>> This information should be included in the "Goodness-of-Fit Measures" >>>> section of the NonlinearModelFit documentation, which should also >>>> point out that RSquared is computed as 1 - (Residual SS)/(Total SS), >>>> and that in nonlinear models this is generally different from the >>>> ratio (Model SS)/(Total SS) that is sometimes cited -- e.g., >>>> http://reference.wolfram.com/mathematica/RegressionCommon/ref/RSquared.html >>>> >>>> -- as the definition of RSquared. >>> The RegressionCommon documentation is for a now obsolete standard >>> package. The "RSquared" property for nonlinear models is described >>> near the bottom of >>> >>> http://reference.wolfram.com/mathematica/tutorial/StatisticalModelAnalysis.html >>> >>> >>> The current statement is: >>> >>> "The coefficient of determination "RSquared" is the ratio of the model >>> sum of squares to the total sum of squares." >>> >>> I will modify this to mention that the total is the uncorrected total >>> for the next version. >> Also, the n-1 in the formula for AdjustedRSquared should be n, >> because the total sum of squares is uncorrected. >> >> However, all that misses the main point I was trying to make, which >> is that simply changing from corrected to uncorrected sums of squares >> will not give 1 - SS_res/SS_tot, which is how NonlinearModelFit >> calculates RSquared. The reason is that the residuals are not >> generally orthogonal to the fitted values, so the decomposition >> SS_tot = SS_fit + SS_res that holds for linear models does not >> generally hold for nonlinear models. >> >> For instance, using the data and model from the "Goodness-of-Fit >> Measures" section of the NonlinearModelFit documentation, >> the fitted values are >> >> {13.658, 2.00568, 1.48485, 14.8951, 5.6088, 10.1695, >> 11.0627, 5.77841, 4.51702, 5.67666, 13.4947, 11.4323}, >> >> and the residuals are >> >> {0.742037, 9.09432, 5.01515, -3.79512, 1.1912, 0.930521, >> 1.3373, 3.12159, 4.08298, 5.72334, -1.69468, -0.532303}. >> >> Their uncentered inner product is 50.2468; centering gives -159.435. >> > > Thanks for catching the AdjustedRSquared typo. The code is > (effectively) using n. I've corrected the docs. > > I see your point about the orthogonality now. I missed it in the > original example because the original example was actually a linear > model. I'll have to take a closer look and decide if the code or the > docs need to be corrected. > > Darren Glosemeyer > Wolfram Research I have corrected the documentation (for the next version) for the nonlinear "RSquared" property to state that it is 1 - SS_res/SS_tot. Darren Glosemeyer Wolfram Research