Re: Fitting data on a vertical line

*To*: mathgroup at christensen.cybernetics.net*Subject*: [mg1560] Re: Fitting data on a vertical line*From*: withoff (David Withoff)*Date*: Sat, 24 Jun 1995 05:49:07 -0400*Organization*: Wolfram Research, Inc.

In article <3sd9dm$n5n at news0.cybernetics.net> phpull at unix1.sncc.lsu.edu (Joe Wade Pulley) writes: >Hello, > I have recently accidentally asked Mathematica to do a linear >least squares fit to a set of data which were exactly vertically >placed. Instead of giving me an error or an infinite slope, >Mathematica spits out some sort of fit which is totally wrong. >For example, if I make up a list of data which is similar to mine, I >get the following results. > > > >In[1]:= >ls={{2.1,3},{2.1,4},{2.1,5},{2.1,6},{2.1,7}} > >Out[1]= >{{2.1, 3}, {2.1, 4}, {2.1, 5}, {2.1, 6}, {2.1, 7}} > >In[2]:= >ft=Fit[ls,{1,x},x] > >Out[2]= >0.924214 + 1.94085 x > >Obviously, this equation does not "fit" the data I have given. The >equation x=2.1 would. Can anyone explain this very unusual behavior. >-- >Joe Wade Pulley Department of Physics and Astronomy >Louisiana State University >Baton Rouge, LA 70803 phpull at unix1.sncc.lsu.edu Actually, although this result isn't what you expected, it is mathematically correct. If you consider fitting the function p + q x to the data ls={{2.1,3},{2.1,4},{2.1,5},{2.1,6},{2.1,7}} you will find that the sum of the squared deviations between the model and the data reaches a minimum of 10 for any values of p and q such that p + 2.1 q == 5. The value 5 is the average of the response coordinates, which is the best that can be done in fitting this data. If you evaluate the result of Fit when the first coordinate is 2.1 In[2]:= Fit[ls, {1, x}, x] Out[2]= 0.924214 + 1.94085 x In[3]:= % /. x -> 2.1 Out[3]= 5. the result will also be 5. This is the best possible fit, so the result from Fit is correct. If you try the same example using the Regress function from the Statistics`LinearRegression` package you will get a warning message In[4]:= Regress[ls, {1, x}, x] Regress::notdep: The fit is numerically independent of a linear combination of the basis functions. indicating that the basis functions 1 and x are not linearly independent over the range of the data (they are both constants), but the result will be the same. Another way of looking at this problem is to use FindMinimum to minimize the sum of the squared errors directly. In[5]:= se = Plus @@ Apply[(p + q #1 - #2)^2 &, ls, {1}] Out[5]= (-7 + p + 2.1 q) + (-6 + p + 2.1 q) + (-5 + p + 2.1 q) + 2 2 > (-4 + p + 2.1 q) + (-3 + p + 2.1 q) In[6]:= FindMinimum[se, {p, 1}, {q, 1}] Out[6]= {10., {p -> 1.3512, q -> 1.73752}} This yields yet another set of values for p and q, again giving a fit that is mathematically correct, but that is not the same as the fit generated by Fit or Regress. There is still the user-interface question of whether the Fit function should generate a message such as "Warning: the input is unusual and the result probably won't be what you want." In examples where the basis functions are exactly linearly dependent this is a reasonable thing to do (as is done by the Regress function), but since, at least in my experience, such examples are entered deliberately by users who know what they are doing, it hasn't seemed that the message was worth the extra time and memory. Perhaps this is wrong, and the message should be added to catch cases where the linear dependence isn't quite so obvious. Dave Withoff Research and Development Wolfram Research