Re: Mathematica calculates RSquared wrongly?

*To*: mathgroup at smc.vnet.net*Subject*: [mg112795] Re: Mathematica calculates RSquared wrongly?*From*: Darren Glosemeyer <darreng at wolfram.com>*Date*: Fri, 1 Oct 2010 05:39:35 -0400 (EDT)

On 9/30/2010 3:53 AM, Ray Koopman wrote: > On Sep 29, 7:48 am, Darren Glosemeyer<darreng at wolfram.com> wrote: >> On 9/29/2010 3:15 AM, Ray Koopman wrote: >>> On Sep 28, 3:09 am, Darren Glosemeyer<darreng at wolfram.com> wrote: >>>> On 9/27/2010 4:47 AM, Lawrence Teo wrote: >>>>> [...] >>>>> nlm = NonlinearModelFit[sbbBN, a*x^2 + b*x + c, {a, b, c}, x] >>>>> nlm["RSquared"] >>>>> >>>>> The RSquared by Mathematica is 0.963173 >>>>> Meanwhile, Excel and manual hand calculation show that R^2 should >>>>> be equal to 0.7622. >>>>> >>>>> Is Mathematica wrong? Thanks! >>>> This is as designed. For nonlinear models, the corrected (i.e. with >>>> the mean subtracted out) sum of squares is sometimes used. This is >>>> consistent with comparing to a constant model, but most nonlinear >>>> models do not include a constant in an additive way. For this reason, >>>> NonlinearModelFit uses the uncorrected (i.e. without subtracting out >>>> the mean) sum of squares. >>> This information should be included in the "Goodness-of-Fit Measures" >>> section of the NonlinearModelFit documentation, which should also >>> point out that RSquared is computed as 1 - (Residual SS)/(Total SS), >>> and that in nonlinear models this is generally different from the >>> ratio (Model SS)/(Total SS) that is sometimes cited -- e.g., >>> http://reference.wolfram.com/mathematica/RegressionCommon/ref/RSquared.html >>> -- as the definition of RSquared. >> The RegressionCommon documentation is for a now obsolete standard >> package. The "RSquared" property for nonlinear models is described >> near the bottom of >> >> http://reference.wolfram.com/mathematica/tutorial/StatisticalModelAnalysis.html >> >> The current statement is: >> >> "The coefficient of determination "RSquared" is the ratio of the model >> sum of squares to the total sum of squares." >> >> I will modify this to mention that the total is the uncorrected total >> for the next version. > Also, the n-1 in the formula for AdjustedRSquared should be n, > because the total sum of squares is uncorrected. > > However, all that misses the main point I was trying to make, which > is that simply changing from corrected to uncorrected sums of squares > will not give 1 - SS_res/SS_tot, which is how NonlinearModelFit > calculates RSquared. The reason is that the residuals are not > generally orthogonal to the fitted values, so the decomposition > SS_tot = SS_fit + SS_res that holds for linear models does not > generally hold for nonlinear models. > > For instance, using the data and model from the "Goodness-of-Fit > Measures" section of the NonlinearModelFit documentation, > the fitted values are > > {13.658, 2.00568, 1.48485, 14.8951, 5.6088, 10.1695, > 11.0627, 5.77841, 4.51702, 5.67666, 13.4947, 11.4323}, > > and the residuals are > > {0.742037, 9.09432, 5.01515, -3.79512, 1.1912, 0.930521, > 1.3373, 3.12159, 4.08298, 5.72334, -1.69468, -0.532303}. > > Their uncentered inner product is 50.2468; centering gives -159.435. > Thanks for catching the AdjustedRSquared typo. The code is (effectively) using n. I've corrected the docs. I see your point about the orthogonality now. I missed it in the original example because the original example was actually a linear model. I'll have to take a closer look and decide if the code or the docs need to be corrected. Darren Glosemeyer Wolfram Research