Re: Mathematica calculates RSquared wrongly?
- To: mathgroup at smc.vnet.net
- Subject: [mg112787] Re: Mathematica calculates RSquared wrongly?
- From: Ray Koopman <koopman at sfu.ca>
- Date: Thu, 30 Sep 2010 04:53:00 -0400 (EDT)
On Sep 29, 7:48 am, Darren Glosemeyer<darreng at wolfram.com> wrote: > On 9/29/2010 3:15 AM, Ray Koopman wrote: >> On Sep 28, 3:09 am, Darren Glosemeyer<darreng at wolfram.com> wrote: >>> On 9/27/2010 4:47 AM, Lawrence Teo wrote: >>>> [...] >>>> nlm = NonlinearModelFit[sbbBN, a*x^2 + b*x + c, {a, b, c}, x] >>>> nlm["RSquared"] >>>> >>>> The RSquared by Mathematica is 0.963173 >>>> Meanwhile, Excel and manual hand calculation show that R^2 should >>>> be equal to 0.7622. >>>> >>>> Is Mathematica wrong? Thanks! >>> >>> This is as designed. For nonlinear models, the corrected (i.e. with >>> the mean subtracted out) sum of squares is sometimes used. This is >>> consistent with comparing to a constant model, but most nonlinear >>> models do not include a constant in an additive way. For this reason, >>> NonlinearModelFit uses the uncorrected (i.e. without subtracting out >>> the mean) sum of squares. >> >> This information should be included in the "Goodness-of-Fit Measures" >> section of the NonlinearModelFit documentation, which should also >> point out that RSquared is computed as 1 - (Residual SS)/(Total SS), >> and that in nonlinear models this is generally different from the >> ratio (Model SS)/(Total SS) that is sometimes cited -- e.g., >> http://reference.wolfram.com/mathematica/RegressionCommon/ref/RSquared.html >> -- as the definition of RSquared. > > The RegressionCommon documentation is for a now obsolete standard > package. The "RSquared" property for nonlinear models is described > near the bottom of > > http://reference.wolfram.com/mathematica/tutorial/StatisticalModelAnalysis.html > > The current statement is: > > "The coefficient of determination "RSquared" is the ratio of the model > sum of squares to the total sum of squares." > > I will modify this to mention that the total is the uncorrected total > for the next version. Also, the n-1 in the formula for AdjustedRSquared should be n, because the total sum of squares is uncorrected. However, all that misses the main point I was trying to make, which is that simply changing from corrected to uncorrected sums of squares will not give 1 - SS_res/SS_tot, which is how NonlinearModelFit calculates RSquared. The reason is that the residuals are not generally orthogonal to the fitted values, so the decomposition SS_tot = SS_fit + SS_res that holds for linear models does not generally hold for nonlinear models. For instance, using the data and model from the "Goodness-of-Fit Measures" section of the NonlinearModelFit documentation, the fitted values are {13.658, 2.00568, 1.48485, 14.8951, 5.6088, 10.1695, 11.0627, 5.77841, 4.51702, 5.67666, 13.4947, 11.4323}, and the residuals are {0.742037, 9.09432, 5.01515, -3.79512, 1.1912, 0.930521, 1.3373, 3.12159, 4.08298, 5.72334, -1.69468, -0.532303}. Their uncentered inner product is 50.2468; centering gives -159.435.