MathGroup Archive: April 2006 [00011]

[Date Index] [Thread Index] [Author Index]

Re: simultaneous nonlinear regression of a lot of data

To: mathgroup at smc.vnet.net
Subject: [mg65441] Re: [mg65427] simultaneous nonlinear regression of a lot of data
From: Darren Glosemeyer <darreng at wolfram.com>
Date: Sat, 1 Apr 2006 05:39:01 -0500 (EST)
Sender: owner-wri-mathgroup at wolfram.com

One possibility is to construct the model as a Piecewise function using an 
index variable to determine which model to use for each data point.  As a 
shortened example, here is some data and a model comprised of 5 sub data 
sets and models.


In[1]:= g[nn_, y_, m1_, m2_, t_] = 1/nn(y / (1 + t/m1) + (1 - y) / (1 + 
t/m2) );

In[2]:= datafun[y_, nn_, m1_, m2_] := Table[{y, t, g[nn, y, m1, m2, t]}, 
{t, 1, 6}]

In[3]:= {modelfuns, data} =
             With[{datasets =
                   Table[{g[nn = Random[Integer, {5, 10}], y, 1.3, m2, t],
                       datafun[y, nn, 1.3, 5 + Random[Real, {-.1, .1}]]}, 
{y, 5}]},
               {datasets[[All, 1]], Flatten[datasets[[All, 2]], 1]}];


This is the model we will fit.

In[4]:= InputForm[
           model[m2_, i_, t_] =
             Piecewise[Transpose[{modelfuns, Thread[i == 
Range[Length[modelfuns]]]}]]]

Out[4]//InputForm=
Piecewise[{{1/(5*(1 + 0.7692307692307692*t)), i == 1},
   {(2/(1 + 0.7692307692307692*t) - (1 + t/m2)^(-1))/6, i == 2},
   {(3/(1 + 0.7692307692307692*t) - 2/(1 + t/m2))/9, i == 3},
   {(4/(1 + 0.7692307692307692*t) - 3/(1 + t/m2))/7, i == 4},
   {(5/(1 + 0.7692307692307692*t) - 4/(1 + t/m2))/10, i == 5}}, 0]



For a problem of this size, FindFit will work just fine for computing 
m2.  NonlinearRegress could also be used in this case.


In[5]:= FindFit[data, model[m2, i, t], {{m2, 4.5, 5.5}}, {i, t}]

Out[5]= {m2 -> 5.00695}



For a problem of the magnitude you described, the approach above will be 
very time and memory consuming.  A better approach is to construct the 
error sum of squares and minimize it using FindMinimum.  Here we construct 
a data set and model based on 100 sub models and sub data sets containing 
200 points each.



In[6]:= datafun2[y_, nn_, m1_, m2_] :=
           Table[{y, t, g[nn, y, m1, m2, t]}, {t, 1, 200}]

In[7]:= {modelfuns, data} =
             With[{datasets =
                   Table[{g[nn = Random[Integer, {5, 10}], y, 1.3, m2, t],
                       datafun2[y, nn, 1.3, 5 + Random[Real, {-.1, .1}]]}, 
{y, 100}]},
               {datasets[[All, 1]], Flatten[datasets[[All, 2]], 1]}];

In[8]:= model2[m2_, i_, t_] =
             Piecewise[
               Transpose[{modelfuns, Thread[i == Range[Length[modelfuns]] + 
1]}]];



The following constructs the sum of squared errors, and shows the optimum 
value for m2 obtaining by minimizing the sum of squared errors.



In[9]:= ssq = Total[Map[(#[[-1]] - model2[m2, #[[1]], #[[2]]]) &, data]^2];

In[10]:= FindMinimum[ssq, {m2, 5}] // Timing

Out[10]= {12.187 Second, {914.043, {m2 -> 4.87829}}}



Darren Glosemeyer
Wolfram Research



At 06:09 AM 3/31/2006 -0500, dantimatter wrote:
>Hello all,
>
>I have a question about using the nonlinear regression function on a
>large data set.  Perhaps some of you have suggestions, and can point me
>in another direction if this is not the best way to solve this problem.
>
>
>Basically I have ~100 data sets of ~200 points each, and I'd like to
>fit each set to the following function:
>
>G(t) = 1/N * [ y / (1+t/m1) + (1-y) / (1+t/m2) ]
>
>For each data set, the numbers N and y are different, but the numbers
>m1 and m2 are the same for all data sets.  The problem is that I only
>know m1, and not m2.  I am hoping to simultaneously solve all these
>data sets to come up with a value for m2, but I'm not entirely sure how
>to code it.  I can come up with a reasonable m2 to start any
>regression.
>
>Any thoughts?
>
>Thanks!

Prev by Date: Re: A test with options

Next by Date: Re: BasicInput Palette - Restoring Original Size

Previous by thread: Re: simultaneous nonlinear regression of a lot of data

Next by thread: Re: simultaneous nonlinear regression of a lot of data