MathGroup Archive: February 2009 [00588]

[Date Index] [Thread Index] [Author Index]

Re: linear regression with errors in both variables

To: mathgroup at smc.vnet.net
Subject: [mg96538] Re: linear regression with errors in both variables
From: Mark Fisher <particlefilter at gmail.com>
Date: Mon, 16 Feb 2009 06:54:56 -0500 (EST)
References: <gmrmga$a1k$1@smc.vnet.net>

On Feb 10, 5:56 am, Joerg <scha... at biologie.hu-berlin.de> wrote:
> Hi,
>
> I want to test the hypothesis that my data
> follows a known simple linear relationship,
> y = a + bx. However, I have (known) measurements
> errors in both the y and the x values.
>
> I suppose just a simple linear regression
> does not do here.
>
> Any suggestions how do test this correctly?
>
> Thanks,
>
> joerg

Hi all,

The classic "errors-in-variables" problem in econometrics has the
following structure:

(1) yt = a + b*zt + et

The paramters of interest are (a, b). The problem is that we don't
observe zt. Instead we observe xt which equals zt plus a measurement
error:

(2) xt = zt + ut

In order to proceed, we need to model the unobserved zt. Let

(3) zt = m + vt

Assume (et, ut, vt) are iid N(0, V) where V = diag(se^2, su^2,
sv^2).

Integrating zt out of (1-3) using

Integrate[
	PDF[NormalDistribution[a + b*zt, se], yt] *
	PDF[NormalDistribution[zt, su], xt] *
	PDF[NormalDistribution[m, sv], zt],
	{zt, -Infinity, Infinity},
	GenerateConditions -> False]

produces (after rearrangement)

(4)	PDF[NormalDistribution[a1 + b1*xt, se1], yt] * PDF
[NormalDistribution[m, su1], xt]

which can be expressed as

(5) yt = a1 + b1*xt + e1t
(6) xt = m + u1t

where

(7) a1 = a + b*m*su^2/(su^2+sv^2)
(8) b1 = b*sv^2/(su^2+sv^2)
(9) se1^2 = se^2 + b^2*su^2*sv^2/(su^2+sv^2)
(10) su1^2 = su^2+sv^2

The likelihood for the parameters, theta = (a,b,m,se,su,sv), is given
by

likelihood = Product[fun[data[[t]]], {t, T}]

where data is a list of the observed {xt, yt} pairs and fun[{xt, yt}]
is given by (4) using the substitutions (7-10).

A Bayesian approach specifies a prior distribution for theta, p
(theta), and then uses an MCMC algorithm (such as the random-walk
Metropolis algorithm) to draw from the posterior distribution, p(theta|
data). [The posterior distribution is proportional to likelihood * p
(theta).]

Note that (xt, yt) has a bivariate normal distribution, which is
characterized by 5 parameters. However, there are 6 parameters in
theta; this suggests a lack of identification. Fortunately, this
presents no particular problem for the Bayesian approach, even with a
flat prior for theta.

--Mark

Prev by Date: Re: newbie here,, need help with parametrics

Next by Date: Re: Re: Functions inside modules with external definitions

Previous by thread: Re: linear regression with errors in both variables

Next by thread: Mathematicas simplifications