MathGroup Archive: February 2011 [00102]

[Date Index] [Thread Index] [Author Index]

Re: statistical comparison of parameters from two applications of NonlinearModelFit

To: mathgroup at smc.vnet.net
Subject: [mg116139] Re: statistical comparison of parameters from two applications of NonlinearModelFit
From: DrMajorBob <btreat1 at austin.rr.com>
Date: Thu, 3 Feb 2011 05:30:12 -0500 (EST)

For nonlinear models, it seems rash beyond words, to me, to assume  
"normality, homoscedasticity, and independence". Hence, I doubt most of  
the statistics returned mean much of anything.

The same is true in linear modeling, but you have a better chance of  
testing the assumptions.

Bobby

On Wed, 02 Feb 2011 05:08:31 -0600, Ray Koopman <koopman at sfu.ca> wrote:

> On Feb 1, 3:54 am, dantimatter <goo... at dantimatter.com> wrote:
>> Hello Everyone,
>>
>> First, I must apologize if the following questions are silly;
>> I am neither a statistician nor a particularly good Mathematica
>> user (although I do occasionally pretend to be one).  And now,
>> on the question:
>>
>> I've got some data that I try to fit with NonlinearModelFit:
>>
>> time1 = {0, 100, 200, 300, 400, 500, 600, 700, 800};
>> data1 = {1, 0.92, 0.79, 0.81, 0.81, 0.78, 0.72, 0.75, 0.68};
>>
>> nlm1= NonlinearModelFit[Transpose[{time1, data1}],
>>   a1*Exp[-b1*x] + c1, {{a1, .5}, {b1, 0.001}, {c1, 0.5}}, x,
>>   ConfidenceLevel -> .95]
>>
>> with results:
>>
>> In[11] := nlm1["BestFitParameters"]
>>              nlm1["ParameterConfidenceIntervals"]
>>              nlm1["ParameterErrors"]
>>
>> Out[11] = {a1 -> 0.295575, b1 -> 0.00334141, c1 -> 0.696761}
>> Out[12] = {{0.179807, 0.411344}, {-0.000284765, 0.00696758},
>>            {0.582507, 0.811014}}
>> Out[13] = {0.0473121, 0.00148194, 0.0466929}
>>
>> I then do the same thing on a second data set:
>>
>> time2 = {0, 100, 200, 300, 400, 500};
>> data2 = {0.8, 0.73, 0.65, 0.61, 0.58, 0.57};
>>
>> nlm2= NonlinearModelFit[Transpose[{time2, data2}],
>>   a2*Exp[-b2*x] + c2, {{a2, .5}, {b2, 0.001}, {c2, 0.5}}, x,
>>   ConfidenceLevel -> .95]
>>
>> In[15]:= nlm2["BestFitParameters"]
>>             nlm2["ParameterConfidenceIntervals"]
>>             nlm2["ParameterErrors"]
>>
>> Out[15] = {a2 -> 0.287834, b2 -> 0.00362383, c2 -> 0.516796}
>> Out[16] = {{0.204254, 0.371413}, {0.00121124, 0.00603641},
>>            {0.427572, 0.60602}}
>> Out[17] = {0.0262627, 0.000758091, 0.0280362}
>>
>> Based on our understanding of the real system, we actually expect
>> b2 <= b1, but this not what the fitting tells us (of course, there's
>> a bit of error in the extracted parameters).
>>
>> So what I'd really like to know is:
>>
>> (1) how can I make sense of these "ParameterConfidenceIntervals"
>> and "ParameterErrors" (how are they determined, and what is meant
>> by them, beyond their brief description in
>> http://reference.wolfram.com/mathematica/tutorial/StatisticalModelAnalysis.html  
>> )?
>>
>> and
>>
>> (2) is there a way to establish some kind of upper limit on the
>> detectability of the difference between b2 and b1?
>>
>> Any thoughts you might have would be most appreciated.
>>
>> Many thanks!
>
> The parameter estimates are the values of the parameters (a, b, c)
> that minimize the sum of squared differences between the observed
> data and the values predicted from the model.
>
> Each ParameterError is an estimate of the standard error of
> the corresponding parameter estimate.
>
> Each ParameterConficenceInterval is an interval that you are
> (100*ConfidenceLevel)% certain contains the true value of the
> corresponding parameter *if the model is complete (all the variables
> that matter are included), the form of the model function is correct,
> and the distributional assumptions about the errors (normality,
> homoscedasticity, independence) are correct*.
>
> Each confidence level is constructed as
> estimated_parameter +/- ( estimated_standard_error *
>   Quantile[StudentTDistribution[degrees_of_freedom],
>            (1 + ConfidenceLevel)/2] ),
> where degrees of freedom =
>   #_of_data_points - #_of_estimated_parameters.
>
> Each estimated standard error is the square root of the product of two
> factors -- one that is common to all the estimates, and one that is
> specific to each estimate.
>
> The common factor is the mean square error of approximation:
> the sum of squares that the parameter estimates minimize,
> divided by the degrees of freedom.
>
> The specific factors are more complicated, so I'll say only that:
> 1. They do not directly reflect the size of the data-approximation
>    errors;
> 2. They are strongly tied to the values of the independent variable
>    (your 'time');
> 3. Ceteris paribus, they are inversely proportional to the number of
>    data points.
>
> Now we're ready to talk about detecting a difference between b1 and
> b2. The formal test of the hypothesis that there is no difference
> is a student t: (b1 - b2)/Sqrt[ASE[b1]^2 + ASE[b2]^2] = -.170 .
> The degrees of freedom are data dependent, but in your case the t
> is so small that it will never be significant, no matter what the
> degrees of freedom are.
>
> To achieve significance, the t would have to be at least 12 times
> its current value, which means you would need 144 times as many
> sample points as you currently have. (Note: I've cut a lot of
> corners here. A proper analysis would be much messier but would
> lead to the same conclusion -- you need a whole lot more data,
> on the order of 200 times as much as you have shown us.)
>
> Finally, here's some code that should reproduce some of the
> NonlinearModelFit output:
>
> r1 = FindFit[Transpose@{time1,data1}, a1*Exp[-b1*x]+c1,
>              {{a1,.5},{b1,.001},{c1,.5}}, x]
> f1 = a1*Exp[-b1*time1]+c1;
> e1 = f1 - data1;
> df1 = Length@data1 - 3;
> var1 = (e1.e1/.r1)/df1                  (* error variance *)
> cov1 = var1*Inverse[Transpose at #.#]&@Outer[D[##]&,f1,{a1,b1,c1}]/.r1;
> ase1 = Sqrt@Diagonal@cov1               (*      ASEs      *)
> cov1/Outer[Times,ase1,ase1]             (*  correlations  *)
> ({a1,b1,c1}/.r1) + Outer[Times,ase1,    (*       CIs      *)
>  {-1,1}]*Quantile[StudentTDistribution[df1],.975]
>


-- 
DrMajorBob at yahoo.com

Prev by Date: Re: How to start new input line?

Next by Date: Re: statistical comparison of parameters from two applications of NonlinearModelFit

Previous by thread: Re: statistical comparison of parameters from two applications of NonlinearModelFit

Next by thread: Re: statistical comparison of parameters from two applications of NonlinearModelFit