       Re: linear regression with errors in both variables

• To: mathgroup at smc.vnet.net
• Subject: [mg96421] Re: linear regression with errors in both variables
• From: Daniel Lichtblau <danl at wolfram.com>
• Date: Fri, 13 Feb 2009 03:43:34 -0500 (EST)
• References: <gmrmga\$a1k\$1@smc.vnet.net>

```On Feb 10, 4:56 am, Joerg <scha... at biologie.hu-berlin.de> wrote:
> Hi,
>
> I want to test the hypothesis that my data
> follows a known simple linear relationship,
> y = a + bx. However, I have (known) measurements
> errors in both the y and the x values.
>
> I suppose just a simple linear regression
> does not do here.
>
> Any suggestions how do test this correctly?
>
> Thanks,
>
> joerg

Assuming errors in both coordinates are comparable, you can do a total
least squares fit with a singular values decomposition (there also
exist weighted variants; not sure what will be suitable for your
needs). I'll illustrate a step-by-step example of the basic case. It
is set up so that the slope is around -2.7, and y intercept around
3.3. We will recover approximations to these values.

m = 20;
xvals = Range[m] + RandomReal[{-1, 1}, {m}];
yvals = 3.3 - 2.7*xvals + RandomReal[{-1, 1}, {m}];

We compute the means, which we will subtract so as to get data
centered at the origin.

means = Mean /@ {xvals, yvals};
shiftedvals = Transpose[{xvals, yvals} - means];

Now compute the SVD of the transposed, shifted data matrix.

{u, w, v} = SingularValueDecomposition[shiftedvals];

The vector perpendicular to the best fitting line is given as the
product of the right-side orthogonal matrix in the SVD, with a unit
vector in the last (y, in this case) direction.

e[j_, n_] := Module[{u = Table[0, {n}]}, u[[j]] = 1; u]
perp = v.e[2, 2];

Now form the linear polynomial that defines our approximated line.

In:= linearpoly = ({x, y} - means).perp;
Expand[linearpoly/Coefficient[linearpoly, y]]

Out= -3.18513 + 2.69773 x + 1. y

Daniel Lichtblau
Wolfram Research

```

• Prev by Date: Re: very slow Save[]s on Mathematica V7??
• Next by Date: Re: Definition of the similarity in a set of integers
• Previous by thread: Re: linear regression with errors in both variables
• Next by thread: Re: linear regression with errors in both variables