MathGroup Archive 2005

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: LinearRegression

  • To: mathgroup at smc.vnet.net
  • Subject: [mg60043] Re: LinearRegression
  • From: Peter Pein <petsie at dordos.net>
  • Date: Tue, 30 Aug 2005 04:43:01 -0400 (EDT)
  • References: <dehjfi$c4p$1@smc.vnet.net>
  • Sender: owner-wri-mathgroup at wolfram.com

Tom De Vries schrieb:
> Hello!   I am trying to construct a list of "flawed" data points that would
> fit a linear model but have a specific Correlation Coefficient (r)
> 
> My feeble attempts can produce a nice set of somewhat messed up data for a
> scatter plot...
> 
> For example
> 
> m = 0.5; b = 2.7;
> 
> fudge := Random[Real, {-1, 1}];
> 
> data = Table[{i, fudge  + (m i +  b)}, {i, 1, 15, 1}];
> 
> ListPlot[data, PlotStyle -> {Hue[0.78]}];
> 
> 
> If I analyze the data....
> 
> << Statistics`MultiDescriptiveStatistics`
> 
> {xlist, ylist} = Transpose[data];
> 
> Correlation[xlist, ylist]
> 
> I can get  (r)
> 
> I've never studied Statistics, so I apologize if this is a really obvious
> question.  Can I do the reverse?  Is is possible to produce a set of data
> that would have a given r ?
> 
> The application of this is to produce sets of data as examples and questions
> for simple linear regression in a high school math class.
> 
> Thank you for any help you can provide on this.
> 
> Sincerely,
> 
> Tom De Vries
> 
> 
Hi Tom,

if you want an exact result for r, start with randomly shifted values:
In[1]:=
Off[General::spell1];
SeedRandom[1];
n = 15; m = 0.5; b = 2.7; fudge := 2*Random[] - 1;
xvec = Range[n] + Table[fudge/2, {n}]; yvec = y /@ Range[n];
y0 = m*xvec + b + Table[fudge, {n}];
rsquared = Together[Correlation[xvec, yvec]^2];
(* squaring speeds up the calculations below *)
Correlation[xvec, y0]
Out[7]=
0.9604020383011801

Well, that's not very good. We want (say) r=0.98765
(mind the power 2!):

In[8]:=
target = 0.98765^2;

The solution should not differ too much from the starting values.
Therefore I let Mathematica minimize (ysol-y0)^2 with the constraint
Correlation^2==target:

In[9]:=
ysol = yvec /. Last[NMinimize[{(#1 . #1 & )[yvec - y0],
  rsquared == target}, yvec]]
Out[9]=
{3.4189904161075724, 4.308217962745214, 4.027884962099622,
  4.973223521194384, 5.977666767464987, 6.30055842370482,
  6.123658489301134, 7.248069404869771, 6.826104457810174,
  7.537551447099687, 8.30583322738566, 9.413350135809361,
  8.921393700729098, 9.415907295717503, 10.698605612874013}

Now let's have a look at the correlation

In[10]:=
Correlation[xvec, ysol]
Out[10]=
0.98765

yepp!

In[11]:=
DisplayTogether[
  ListPlot[Transpose[{xvec, ysol}],
    PlotStyle -> {Red, AbsolutePointSize[5]}],
  ListPlot[Transpose[{xvec, y0}],
    PlotStyle -> {Hue[0.5, 0.5, 0.7], AbsolutePointSize[4]}],
  Plot[b + m*x, {x, Min[xvec], Max[xvec]}],
  Graphics[{Blue, Line /@
    Transpose[Transpose /@ {{xvec, y0}, {xvec, ysol}}]}]
];

-- 
Peter Pein, Berlin
GnuPG Key ID: 0xA34C5A82
http://people.freenet.de/Peter_Berlin/


  • Prev by Date: Re: RasterGraphics[]
  • Next by Date: Pictures into Mathematica
  • Previous by thread: Re: LinearRegression
  • Next by thread: Question about vector analysis in Mathematica