MathGroup Archive: June 2006 [00539]

[Date Index] [Thread Index] [Author Index]

Re: Re: standard errors and confidence intervals in NonlinearRegress

To: mathgroup at smc.vnet.net
Subject: [mg67368] Re: [mg67331] Re: standard errors and confidence intervals in NonlinearRegress
From: Darren Glosemeyer <darreng at wolfram.com>
Date: Tue, 20 Jun 2006 02:15:05 -0400 (EDT)
References: <e70f49$rg4$1@smc.vnet.net>
Sender: owner-wri-mathgroup at wolfram.com

At 05:13 AM 6/18/2006 -0400, Jason Quinn wrote:
>Darren Glosemeyer wrote:
> > The standard errors and confidence intervals in nonlinear regression are
> > based on asymptotic normality.
>
>      While the topic is up, I have a similar question about the
>standard error (SE), covariance matrix, and correlation matrix reported
>by Regress. Do you know how are they calculated? I've been using
>Bevington and Robertson's "Data Reduction and Error Analysis for the
>Physical Sciences" to calculate them and I cannot reproduce any of the
>values given by Mathematica (with or without weighting). For instance,
>Bevington says that the error on the fitting parameters for linear
>regression are the square roots of the entries in the inverse of the
>following matrix (the curvature matrix):
>
>A_jk= Sum[w_i * f_j(x_i) * f_k(x_i)].
>
>Here w_i is the weight of the i-th datum, f_k is the k-th basis
>function, and the i-th measurement of the independent variables are
>collectively called x_i. The "errors" generated using this formula do
>not agree with the SE values reported by Regress (ignoring me doing
>something totally stupid, of course). Simiarly with what he calls the
>covariance matrix. I've tried working in factors of Sqrt(N), etc.,
>thinking is some parent vs sample problem but to no avail. I just don't
>know what is being reported by Regress and the documentation doesn't
>specify in detail.
>
>Thanks for any insight anybody can give,
>Jason Quinn

The difference you observe is due to a difference between weights based on 
assigned measurement errors and weighted regression.

In unweighted regression, responses (measurements without assigned errors) 
are assumed to follow a specified model and have normally distributed 
errors with some common variance sigma^2 that is estimated from the 
fitting.  In weighted regression, the responses are assumed to have 
normally distributed errors, but the individual responses are not assumed 
to have the same variance.  The assumed variance for response i is 
sigma^2/w_i, where again sigma^2 is an unknown quantity to be estimated 
from the fitted model and w_i is the weight of the ith datum.  This 
weighting allows for incorporating responses with different amounts of 
variability, lessening the effect of responses with assumed high 
variability and increasing the effect of responses with assumed low 
variability on the parameter estimates.

Often in the physical sciences measurement errors serve two purposes: they 
serve both as weights for the model fitting, and as assumed known standard 
errors.  The parameter estimates will be the same in the weighted 
regression and in the model fitting with assumed known standard errors 
because the weights are the same.  The estimation of the standard errors 
for parameter values, however, is different.

In the weighted regression model there is still a constant variance sigma^2 
that needs to be estimated. Under the additional assumption that the 
weights are from measured errors and hence contain the information about 
the variation in the data, the additional variance term is not needed.

Assuming this is the difference you are noting, dividing the 
CovarianceMatrix by the EstimatedVariance, and dividing the SEs by the 
square root of the EstimatedVariance should give the results you are 
looking for.

As a short example, consider the following using data and a model from the 
documentation.

In[1]:= <<Statistics`

In[2]:= data = {{ 0.055, 90}, {0.091, 97}, {0.138, 107}, {0.167, 124}, 
{0.182,142}, {0.211, 150}, {0.232, 172}, {0.248, 189}, {0.284,
         209}, {0.351,253}};

Here are the estimated variance and the covariance matrix returned by Regress.

In[3]:= 
{estvar,covmat}=({EstimatedVariance,CovarianceMatrix}/.Regress[data,{1,x^2},x,RegressionReport->{EstimatedVariance,CovarianceMatrix}]/.MatrixForm->Identity)

Out[3]= {64.9129, {{17.7381, -247.146}, {-247.146, 5430.96}}}

In this case, the weights are all 1 so the inverse of A_jk is obtained as 
follows.

In[4]:= xmat=DesignMatrix[data,{1,x^2},x]

Out[4]= {{1, 0.003025}, {1, 0.008281}, {1, 0.019044}, {1, 0.027889}, {1, 
0.033124},

 >    {1, 0.044521}, {1, 0.053824}, {1, 0.061504}, {1, 0.080656}, {1, 
0.123201}}

In[5]:= Inverse[Transpose[xmat].xmat]

Out[5]= {{0.273261, -3.80735}, {-3.80735, 83.6654}}

If we multiply this by the estimated variance, the result is the covariance 
matrix given above.

In[6]:= estvar*%

Out[6]= {{17.7381, -247.146}, {-247.146, 5430.96}}

As an example with weights other than 1, consider Weights->Range[Length[data]].

In[7]:= 
{estvar2,covmat2}=({EstimatedVariance,CovarianceMatrix}/.Regress[data,{1,x^2},x,RegressionReport->{EstimatedVariance,CovarianceMatrix},Weights->Range[Length[data]]]/.MatrixForm->Identity)

Out[7]= {410.877, {{31.7592, -387.466}, {-387.466, 6181.06}}}

Here the weights are incorporated in the A_jk matrix.

In[8]:= Inverse[Transpose[xmat].(Range[Length[data]]*xmat)]

Out[8]= {{0.0772961, -0.943021}, {-0.943021, 15.0436}}

Multiplying by the estimated variance again gives the covariance matrix.

In[9]:= estvar2*%

Out[9]= {{31.7592, -387.466}, {-387.466, 6181.06}}

Darren Glosemeyer
Wolfram Research

Prev by Date: Re: Determining continuity of regions/curves from inequalities

Next by Date: Re: Re: SetAccuracy formatting question

Previous by thread: Re: standard errors and confidence intervals in NonlinearRegress

Next by thread: Formatted Date String