MathGroup Archive: June 2008 [00423]

[Date Index] [Thread Index] [Author Index]

Re: Estimating slope from noisy data

To: mathgroup at smc.vnet.net
Subject: [mg89650] Re: [mg89597] Estimating slope from noisy data
From: mante <claude.mante at univmed.fr>
Date: Mon, 16 Jun 2008 06:40:54 -0400 (EDT)
References: <200806140929.FAA22830@smc.vnet.net>

andreas.kohlmajer at gmx.de wrote:
> Hi!
>
> I have difficulties to estimate the correct slope from noisy data.
> This is the code to generate the noisy data:
>
> Needs["LinearRegression`"];
> slope = 1.0;
> sigma = 0.5;
> xrange = 1.0;
>
> SeedRandom[123]; (* initialize random generator *)
> rnd = {#, #*slope + RandomReal[NormalDistribution[0, sigma]]} &;
>
> (* generate 2000 data points *)
> data = Table[
>    rnd[RandomReal[NormalDistribution[0, xrange/3.0]]], {2000}];
>
> subset = Take[data, 8];
> ListPlot[subset, PlotRange -> {{-3, 3}, {-3, 3}},
>  PlotStyle -> PointSize[.025]]
> fit = Regress[subset, x, x, IncludeConstant -> False,
>   RegressionReport -> {SummaryReport, ParameterCITable}]
>
> The correct slope is exactly 1. As the data is quite noisy, the CI of
> the slope is very big. The estimated slope is far to big (1.947). If I
> use more data points, the estimation gets better; I could also use a
> wider x-range, to get a better estimate for the slope. However, I'm
> quite limited in the x-range, so using a wider x-range is no option
> for me.
>
> I could check the RSquared for significance (If[Abs[r*Sqrt[n - 2]/
> Sqrt[1 - r^2]] >=
>   Quantile[StudentTDistribution[n - 2], 1 - 0.05], r, 0] (*
> significance of 95% *)). I this case, it is significant.
>
> Is there any other way to get a good estimate for the slope, without
> using too many data points?
>
>
> (Keywords: fit, regression, slope, noisy, rsquared, limited data)
>
>   
Hello,
       I think your problem is due to the way you generate the 
independant variable x. Notice first that regression in *not designed* 
for solving  the problem A.x=y when x is random (this problem is called 
"errors-in variab les modeling").
So you should use *deterministic* values of the independent variable 
(Quasi-Monte Carlo abscissas, for instance).
    But the main flaw in your code lies in the distribution of x : since 
it is Gaussian, the region around the mean is over-sampled! Using a 
uniform design (see the mail of Bill Rowe), you will obtain better results.
    Regards,
       Claude

-- 
*********************************

    Claude Manté

UMR CNRS 6117 LMGEM
http://www.com.univ-mrs.fr/LMGEM/

Centre d'Océanologie de Marseille
Campus de Luminy, Case 901
13288 MARSEILLE Cedex 09
tel : (+33) 491 829 127
fax : (+33) 491 829 119


*********************************

References:
- Estimating slope from noisy data
  - From: andreas.kohlmajer@gmx.de

Prev by Date: Re: Fonts in publication quality figures.

Next by Date: Re: Re: Re: Re: 6.0.3

Previous by thread: Estimating slope from noisy data

Next by thread: Re: Estimating slope from noisy data