Re: Estimating slope from noisy data

*To*: mathgroup at smc.vnet.net*Subject*: [mg89650] Re: [mg89597] Estimating slope from noisy data*From*: mante <claude.mante at univmed.fr>*Date*: Mon, 16 Jun 2008 06:40:54 -0400 (EDT)*References*: <200806140929.FAA22830@smc.vnet.net>

andreas.kohlmajer at gmx.de wrote: > Hi! > > I have difficulties to estimate the correct slope from noisy data. > This is the code to generate the noisy data: > > Needs["LinearRegression`"]; > slope = 1.0; > sigma = 0.5; > xrange = 1.0; > > SeedRandom[123]; (* initialize random generator *) > rnd = {#, #*slope + RandomReal[NormalDistribution[0, sigma]]} &; > > (* generate 2000 data points *) > data = Table[ > rnd[RandomReal[NormalDistribution[0, xrange/3.0]]], {2000}]; > > subset = Take[data, 8]; > ListPlot[subset, PlotRange -> {{-3, 3}, {-3, 3}}, > PlotStyle -> PointSize[.025]] > fit = Regress[subset, x, x, IncludeConstant -> False, > RegressionReport -> {SummaryReport, ParameterCITable}] > > The correct slope is exactly 1. As the data is quite noisy, the CI of > the slope is very big. The estimated slope is far to big (1.947). If I > use more data points, the estimation gets better; I could also use a > wider x-range, to get a better estimate for the slope. However, I'm > quite limited in the x-range, so using a wider x-range is no option > for me. > > I could check the RSquared for significance (If[Abs[r*Sqrt[n - 2]/ > Sqrt[1 - r^2]] >= > Quantile[StudentTDistribution[n - 2], 1 - 0.05], r, 0] (* > significance of 95% *)). I this case, it is significant. > > Is there any other way to get a good estimate for the slope, without > using too many data points? > > > (Keywords: fit, regression, slope, noisy, rsquared, limited data) > > Hello, I think your problem is due to the way you generate the independant variable x. Notice first that regression in *not designed* for solving the problem A.x=y when x is random (this problem is called "errors-in variab les modeling"). So you should use *deterministic* values of the independent variable (Quasi-Monte Carlo abscissas, for instance). But the main flaw in your code lies in the distribution of x : since it is Gaussian, the region around the mean is over-sampled! Using a uniform design (see the mail of Bill Rowe), you will obtain better results. Regards, Claude -- ********************************* Claude Manté UMR CNRS 6117 LMGEM http://www.com.univ-mrs.fr/LMGEM/ Centre d'Océanologie de Marseille Campus de Luminy, Case 901 13288 MARSEILLE Cedex 09 tel : (+33) 491 829 127 fax : (+33) 491 829 119 *********************************

**References**:**Estimating slope from noisy data***From:*andreas.kohlmajer@gmx.de