Re: simultaneous nonlinear regression of a lot of data
- To: mathgroup at smc.vnet.net
 - Subject: [mg65441] Re: [mg65427] simultaneous nonlinear regression of a lot of data
 - From: Darren Glosemeyer <darreng at wolfram.com>
 - Date: Sat, 1 Apr 2006 05:39:01 -0500 (EST)
 - Sender: owner-wri-mathgroup at wolfram.com
 
One possibility is to construct the model as a Piecewise function using an 
index variable to determine which model to use for each data point.  As a 
shortened example, here is some data and a model comprised of 5 sub data 
sets and models.
In[1]:= g[nn_, y_, m1_, m2_, t_] = 1/nn(y / (1 + t/m1) + (1 - y) / (1 + 
t/m2) );
In[2]:= datafun[y_, nn_, m1_, m2_] := Table[{y, t, g[nn, y, m1, m2, t]}, 
{t, 1, 6}]
In[3]:= {modelfuns, data} =
             With[{datasets =
                   Table[{g[nn = Random[Integer, {5, 10}], y, 1.3, m2, t],
                       datafun[y, nn, 1.3, 5 + Random[Real, {-.1, .1}]]}, 
{y, 5}]},
               {datasets[[All, 1]], Flatten[datasets[[All, 2]], 1]}];
This is the model we will fit.
In[4]:= InputForm[
           model[m2_, i_, t_] =
             Piecewise[Transpose[{modelfuns, Thread[i == 
Range[Length[modelfuns]]]}]]]
Out[4]//InputForm=
Piecewise[{{1/(5*(1 + 0.7692307692307692*t)), i == 1},
   {(2/(1 + 0.7692307692307692*t) - (1 + t/m2)^(-1))/6, i == 2},
   {(3/(1 + 0.7692307692307692*t) - 2/(1 + t/m2))/9, i == 3},
   {(4/(1 + 0.7692307692307692*t) - 3/(1 + t/m2))/7, i == 4},
   {(5/(1 + 0.7692307692307692*t) - 4/(1 + t/m2))/10, i == 5}}, 0]
For a problem of this size, FindFit will work just fine for computing 
m2.  NonlinearRegress could also be used in this case.
In[5]:= FindFit[data, model[m2, i, t], {{m2, 4.5, 5.5}}, {i, t}]
Out[5]= {m2 -> 5.00695}
For a problem of the magnitude you described, the approach above will be 
very time and memory consuming.  A better approach is to construct the 
error sum of squares and minimize it using FindMinimum.  Here we construct 
a data set and model based on 100 sub models and sub data sets containing 
200 points each.
In[6]:= datafun2[y_, nn_, m1_, m2_] :=
           Table[{y, t, g[nn, y, m1, m2, t]}, {t, 1, 200}]
In[7]:= {modelfuns, data} =
             With[{datasets =
                   Table[{g[nn = Random[Integer, {5, 10}], y, 1.3, m2, t],
                       datafun2[y, nn, 1.3, 5 + Random[Real, {-.1, .1}]]}, 
{y, 100}]},
               {datasets[[All, 1]], Flatten[datasets[[All, 2]], 1]}];
In[8]:= model2[m2_, i_, t_] =
             Piecewise[
               Transpose[{modelfuns, Thread[i == Range[Length[modelfuns]] + 
1]}]];
The following constructs the sum of squared errors, and shows the optimum 
value for m2 obtaining by minimizing the sum of squared errors.
In[9]:= ssq = Total[Map[(#[[-1]] - model2[m2, #[[1]], #[[2]]]) &, data]^2];
In[10]:= FindMinimum[ssq, {m2, 5}] // Timing
Out[10]= {12.187 Second, {914.043, {m2 -> 4.87829}}}
Darren Glosemeyer
Wolfram Research
At 06:09 AM 3/31/2006 -0500, dantimatter wrote:
>Hello all,
>
>I have a question about using the nonlinear regression function on a
>large data set.  Perhaps some of you have suggestions, and can point me
>in another direction if this is not the best way to solve this problem.
>
>
>Basically I have ~100 data sets of ~200 points each, and I'd like to
>fit each set to the following function:
>
>G(t) = 1/N * [ y / (1+t/m1) + (1-y) / (1+t/m2) ]
>
>For each data set, the numbers N and y are different, but the numbers
>m1 and m2 are the same for all data sets.  The problem is that I only
>know m1, and not m2.  I am hoping to simultaneously solve all these
>data sets to come up with a value for m2, but I'm not entirely sure how
>to code it.  I can come up with a reasonable m2 to start any
>regression.
>
>Any thoughts?
>
>Thanks!