Services & Resources / Wolfram Forums / MathGroup Archive
-----

MathGroup Archive 2008

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Estimating slope from noisy data

  • To: mathgroup at smc.vnet.net
  • Subject: [mg89619] Re: Estimating slope from noisy data
  • From: Jean-Marc Gulliet <jeanmarc.gulliet at gmail.com>
  • Date: Sun, 15 Jun 2008 06:13:07 -0400 (EDT)
  • Organization: The Open University, Milton Keynes, UK
  • References: <g3034r$mbl$1@smc.vnet.net>

andreas.kohlmajer at gmx.de wrote:

> I have difficulties to estimate the correct slope from noisy data.
> This is the code to generate the noisy data:
> 
> Needs["LinearRegression`"];
> slope = 1.0;
> sigma = 0.5;
> xrange = 1.0;
> 
> SeedRandom[123]; (* initialize random generator *)
> rnd = {#, #*slope + RandomReal[NormalDistribution[0, sigma]]} &;
> 
> (* generate 2000 data points *)
> data = Table[
>    rnd[RandomReal[NormalDistribution[0, xrange/3.0]]], {2000}];
> 
> subset = Take[data, 8];
> ListPlot[subset, PlotRange -> {{-3, 3}, {-3, 3}},
>  PlotStyle -> PointSize[.025]]
> fit = Regress[subset, x, x, IncludeConstant -> False,
>   RegressionReport -> {SummaryReport, ParameterCITable}]
> 
> The correct slope is exactly 1. As the data is quite noisy, the CI of
> the slope is very big. The estimated slope is far to big (1.947). If I
> use more data points, the estimation gets better; I could also use a
> wider x-range, to get a better estimate for the slope. However, I'm
> quite limited in the x-range, so using a wider x-range is no option
> for me.
> 
> I could check the RSquared for significance (If[Abs[r*Sqrt[n - 2]/
> Sqrt[1 - r^2]] >=
>   Quantile[StudentTDistribution[n - 2], 1 - 0.05], r, 0] (*
> significance of 95% *)). I this case, it is significant.
> 
> Is there any other way to get a good estimate for the slope, without
> using too many data points?
 >
 > (Keywords: fit, regression, slope, noisy, rsquared, limited data)


You could try using *perpendicular offsets* rather
than vertical offsets. Among many others, see [1, 2, 3] for instance. 
(Note that [2] has also some Mathematica code.)

Also, you could weight your data points either by giving a list of 
explicit numbers or by a weighting function (See the option *Weights* 
for *Regress* as illustrated below.)

[1] Weisstein, Eric W. "Least Squares Fitting--Perpendicular Offsets." 
 From MathWorld--A Wolfram Web Resource. 
http://mathworld.wolfram.com/LeastSquaresFittingPerpendicularOffsets.html

[2] Sardelis, D. and Valahas, T. "Least Squares Fitting-Perpendicular 
Offsets." http://library.wolfram.com/infocenter/MathSource/5292/

[3] József Varga, and Zsolt Szabo, "Modified Regression Model for the 
Logan Plot," _Journal of Cerebral Blood Flow & Metabolism_ (2002) 22, 
240â??244. Available at 
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid12949



Needs["LinearRegression`"];
slope = 1.0;
sigma = 0.5;
xrange = 1.0;

SeedRandom[123];(*initialize random generator*)rnd = {#, #*slope +
     RandomReal[NormalDistribution[0, sigma]]} &;
(*generate 2000 data points*)

data = Table[
    rnd[RandomReal[NormalDistribution[0, xrange/3.0]]], {2000}];

subset = Take[data, 8];
fit = Regress[subset, x, x, Weights -> (1/#^2 &),
   IncludeConstant -> False,
   RegressionReport -> {BestFit, SummaryReport, ParameterCITable}]
Show[ListPlot[subset, PlotRange -> {{-3, 3}, {-3, 3}},
   PlotStyle -> PointSize[.025], AspectRatio -> 1,
   Epilog -> {Red, Thick, Line[{{-3, -3}, {3, 3}}]}],
  Plot[BestFit /. fit, {x, -3, 3}, PlotStyle -> Thick]]


{BestFit -> 1.55639 x, ParameterCITable ->

        Estimate   SE         CI                ,
    x   1.55639    0.360525   {0.703884, 2.4089}

   ParameterTable ->     Estimate   SE         TStat     PValue    ,
                     x   1.55639    0.360525   4.31701   0.00349295

   RSquared -> 0.726953, AdjustedRSquared -> 0.687946,

   EstimatedVariance -> 0.312054,

   ANOVATable ->           DF   SumOfSq   MeanSq     FRatio    PValue    }
                 Model     1    5.81562   5.81562    18.6366   0.00349295

                 Error     7    2.18438   0.312054

                 U Total   8    8.

HTH,
-- Jean-Marc


  • Prev by Date: Re: Estimating slope from noisy data
  • Next by Date: citing mathematica
  • Previous by thread: Re: Estimating slope from noisy data
  • Next by thread: Problem downloading from UK Wolfram site