MathGroup Archive: July 1995 [00000]

[Date Index] [Thread Index] [Author Index]

Re: Fitting data on a vertical line

To: mathgroup at christensen.cybernetics.net
Subject: [mg1569] Re: Fitting data on a vertical line
From: harrison at helios.physics.utoronto.ca (David Harrison)
Date: Sat, 1 Jul 1995 01:48:02 -0400
Organization: University of Toronto - Dept. of Physics

In article <3sg3qt$8ji at news0.cybernetics.net>,
David Withoff <withoff at wri.com> wrote:
>In article <3sd9dm$n5n at news0.cybernetics.net> phpull at unix1.sncc.lsu.edu
>(Joe Wade Pulley) writes:
>>
>>In[1]:=
>>ls={{2.1,3},{2.1,4},{2.1,5},{2.1,6},{2.1,7}}
>>
>>In[2]:=
>>ft=Fit[ls,{1,x},x]
>>
>>Out[2]=
>>0.924214 + 1.94085 x
>
>Actually, although this result isn't what you expected, it is
>mathematically correct. ...
>This is the best possible fit, so the result from Fit is correct.
>
>There is still the user-interface question of whether the Fit
>function should generate a message such as "Warning: the input is
>unusual and the result probably won't be what you want."

David is, as usual, correct: the least-square technique doesn't always
do the thing we expect.

Although "everybody knows" this, we usually ignore it.  One thing which
can often keep us from being misled by having a fitter doing something
inappropriate is to *always* plot the data and the results of the fit.

One of my favorite examples involves some made-up data by Anscombe
( American Statistician 27, (Feb. 1973), pg. 17.).  Fitting this data
is also discussed in Shaw & Tigg if you have a copy handy (I don't have
my copy with me at the moment so can't supply a page number).

Here is the data:

AnscombeData = {{{10., 8.04}, {8., 6.95}, {13., 7.58}, {9., 8.81},
   {11., 8.33}, {14., 9.96}, {6., 7.24}, {4., 4.26},
   {12., 10.84}, {7., 4.82}, {5., 5.68}},
  {{10., 9.14}, {8., 8.14}, {13., 8.74}, {9., 8.77}, {11., 9.26},
   {14., 8.1}, {6., 6.13}, {4., 3.1}, {12., 9.13}, {7., 7.26},
   {5., 4.74}}, {{10., 7.46}, {8., 6.77}, {13., 12.74},
   {9., 7.11}, {11., 7.81}, {14., 8.84}, {6., 6.08},
   {4., 5.39}, {12., 8.15}, {7., 6.42}, {5., 5.73}},
  {{8., 6.58}, {8., 5.76}, {8., 7.71}, {8., 8.84}, {8., 8.47}, {8., 7.04},
   {8., 5.25}, {19., 12.5}, {8., 5.56}, {8., 7.91}, {8., 6.89}}};

If you fit AnscombeData[[1]], AnscombeData[[2]], AnscombeData[[3]] and
AnscombeData[[4]] each to a straight line you will get virtually the
same result for all 4.  You can go a bit further than just the output
returned by Fit and discover that all four fits have essentially identical
sum of the squares and covariance matrices.  ListPlotting the data will
show at a glance that three of the four fits are totally ridiculous.

The question of what to do when the standard least-square algorithm does
something stupid is probably beyond the scope of the topic of this
newsgroup.  However, in the case that began this discussion one thing to
do is:

In[2]:=  Fit[Reverse /@ ls, {1,x}, x]

                         -16
Out[2]=  2.1 - 3.33067 10    x

-- 
David Harrison                             | "The senses do not lie, only 
Dept. of Physics, Univ. of Toronto         |  they do not tell the truth."
Inet: harrison at faraday.physics.utoronto.ca |              -- Mach
Tel: 416-978-2977  Fax: 416-978-5848       |

Next by Date: Re: How to force an expression into Rational Function form?

Next by thread: Re:Fitting data on a vertical line