Re: Fitting data on a vertical line
- To: mathgroup at christensen.cybernetics.net
- Subject: [mg1560] Re: Fitting data on a vertical line
- From: withoff (David Withoff)
- Date: Sat, 24 Jun 1995 05:49:07 -0400
- Organization: Wolfram Research, Inc.
In article <3sd9dm$n5n at news0.cybernetics.net> phpull at unix1.sncc.lsu.edu (Joe Wade Pulley) writes:
>Hello,
> I have recently accidentally asked Mathematica to do a linear
>least squares fit to a set of data which were exactly vertically
>placed. Instead of giving me an error or an infinite slope,
>Mathematica spits out some sort of fit which is totally wrong.
>For example, if I make up a list of data which is similar to mine, I
>get the following results.
>
>
>
>In[1]:=
>ls={{2.1,3},{2.1,4},{2.1,5},{2.1,6},{2.1,7}}
>
>Out[1]=
>{{2.1, 3}, {2.1, 4}, {2.1, 5}, {2.1, 6}, {2.1, 7}}
>
>In[2]:=
>ft=Fit[ls,{1,x},x]
>
>Out[2]=
>0.924214 + 1.94085 x
>
>Obviously, this equation does not "fit" the data I have given. The
>equation x=2.1 would. Can anyone explain this very unusual behavior.
>--
>Joe Wade Pulley Department of Physics and Astronomy
>Louisiana State University
>Baton Rouge, LA 70803 phpull at unix1.sncc.lsu.edu
Actually, although this result isn't what you expected, it is
mathematically correct. If you consider fitting the function
p + q x to the data
ls={{2.1,3},{2.1,4},{2.1,5},{2.1,6},{2.1,7}}
you will find that the sum of the squared deviations between
the model and the data reaches a minimum of 10 for any values
of p and q such that p + 2.1 q == 5. The value 5 is the average
of the response coordinates, which is the best that can be done in
fitting this data. If you evaluate the result of Fit when the
first coordinate is 2.1
In[2]:= Fit[ls, {1, x}, x]
Out[2]= 0.924214 + 1.94085 x
In[3]:= % /. x -> 2.1
Out[3]= 5.
the result will also be 5. This is the best possible fit, so the
result from Fit is correct. If you try the same example using the
Regress function from the Statistics`LinearRegression` package you
will get a warning message
In[4]:= Regress[ls, {1, x}, x]
Regress::notdep:
The fit is numerically independent of a linear combination of
the basis functions.
indicating that the basis functions 1 and x are not linearly
independent over the range of the data (they are both constants),
but the result will be the same.
Another way of looking at this problem is to use FindMinimum
to minimize the sum of the squared errors directly.
In[5]:= se = Plus @@ Apply[(p + q #1 - #2)^2 &, ls, {1}]
Out[5]= (-7 + p + 2.1 q) + (-6 + p + 2.1 q) + (-5 + p + 2.1 q) +
2 2
> (-4 + p + 2.1 q) + (-3 + p + 2.1 q)
In[6]:= FindMinimum[se, {p, 1}, {q, 1}]
Out[6]= {10., {p -> 1.3512, q -> 1.73752}}
This yields yet another set of values for p and q, again giving
a fit that is mathematically correct, but that is not the same
as the fit generated by Fit or Regress.
There is still the user-interface question of whether the Fit
function should generate a message such as "Warning: the input is
unusual and the result probably won't be what you want." In examples
where the basis functions are exactly linearly dependent this
is a reasonable thing to do (as is done by the Regress function),
but since, at least in my experience, such examples are entered
deliberately by users who know what they are doing, it hasn't
seemed that the message was worth the extra time and memory.
Perhaps this is wrong, and the message should be added to catch
cases where the linear dependence isn't quite so obvious.
Dave Withoff
Research and Development
Wolfram Research