Re: Mathematica calculates RSquared wrongly?
- To: mathgroup at smc.vnet.net
- Subject: [mg112719] Re: Mathematica calculates RSquared wrongly?
- From: Bill Rowe <readnews at sbcglobal.net>
- Date: Tue, 28 Sep 2010 06:04:33 -0400 (EDT)
On 9/27/10 at 5:47 AM, lawrenceteo at yahoo.com (Lawrence Teo) wrote: >sbbBN = {{-0.582258428`, 0.49531889`}, {-2.475512593`, >0.751434565`}, {-1.508540016`, 0.571212292`}, {2.004747546`, >0.187621117`}, {1.139972167`, 0.297735572`}, {-0.724053077`, >0.457858443`}, {-0.830992757`, 0.313642502`}, {-3.830561204`, >0.81639874`}, {-2.357296433`, 0.804397821`}, {0.986610836`, >0.221932888`}, {-0.513640368`, 0.704999208`}, {-1.508540016`, >0.798426867`}}; >nlm = NonlinearModelFit[sbbBN, a*x^2 + b*x + c, {a, b, c}, x] >nlm["RSquared"] >The RSquared by Mathematica is 0.963173 Meanwhile, Excel and manual >hand calculation show that R^2 should be equal to 0.7622. >Is Mathematica wrong? Whenever Mathematica and Excel disagree it is almost certain the problem lies with Excel. Simply put, the current versions of Excel should never be relied upon for any serious statistical analysis. Do a Google search on Excel and you can find several sites saying essentially the same thing as I just said here. But this case seems to be the exception. There is a more subtle issue in play. The problem you are solving is not a non-linear problem. Linear versus non-linear in model fitting refers to the way the unknown parameters are included in the model not the functions of x used in the model Consider: In[20]:= m = LinearModelFit[sbbBN, {1, x, x^2}, x]; In[21]:= m@"RSquared" Out[21]= 0.762242 Which is the result returned by Excel. So, in this case it is clear Excel is solving the linear regression problem and computing RSquared for that problem correctly. In general, you never want to use NonlinearModelFit for a linear problem that can be handled by LinearModelFit. Note, R is the *linear* correlation coefficient. To compute something equivalent to R for a non-linear problem you have to generalize the definition of R is some manner. I don't know how this is being done in NonlinearModelFit. It is this detail that is needed to determine whether the result returned for RSquare by NonlinearModelFit is incorrect or not. One final comment. Using powers of x as your set of basis functions is OK for powers less than 2 and possibly OK for powers up to 3. But this is definitely not a good idea for any higher powers of x. The problem is the powers of x do not form an orthogonal basis set. Also, perhaps even more important is the matrices used to solve the linear regression problem become increasingly ill conditioned as the powers of x increase. If you need to fit a high degree polynomial to your data, you should use Chebyshev polynomials as the basis functions rather than powers of x.