Re: simultaneous nonlinear regression of a lot of data

*To*: mathgroup at smc.vnet.net*Subject*: [mg65441] Re: [mg65427] simultaneous nonlinear regression of a lot of data*From*: Darren Glosemeyer <darreng at wolfram.com>*Date*: Sat, 1 Apr 2006 05:39:01 -0500 (EST)*Sender*: owner-wri-mathgroup at wolfram.com

One possibility is to construct the model as a Piecewise function using an index variable to determine which model to use for each data point. As a shortened example, here is some data and a model comprised of 5 sub data sets and models. In[1]:= g[nn_, y_, m1_, m2_, t_] = 1/nn(y / (1 + t/m1) + (1 - y) / (1 + t/m2) ); In[2]:= datafun[y_, nn_, m1_, m2_] := Table[{y, t, g[nn, y, m1, m2, t]}, {t, 1, 6}] In[3]:= {modelfuns, data} = With[{datasets = Table[{g[nn = Random[Integer, {5, 10}], y, 1.3, m2, t], datafun[y, nn, 1.3, 5 + Random[Real, {-.1, .1}]]}, {y, 5}]}, {datasets[[All, 1]], Flatten[datasets[[All, 2]], 1]}]; This is the model we will fit. In[4]:= InputForm[ model[m2_, i_, t_] = Piecewise[Transpose[{modelfuns, Thread[i == Range[Length[modelfuns]]]}]]] Out[4]//InputForm= Piecewise[{{1/(5*(1 + 0.7692307692307692*t)), i == 1}, {(2/(1 + 0.7692307692307692*t) - (1 + t/m2)^(-1))/6, i == 2}, {(3/(1 + 0.7692307692307692*t) - 2/(1 + t/m2))/9, i == 3}, {(4/(1 + 0.7692307692307692*t) - 3/(1 + t/m2))/7, i == 4}, {(5/(1 + 0.7692307692307692*t) - 4/(1 + t/m2))/10, i == 5}}, 0] For a problem of this size, FindFit will work just fine for computing m2. NonlinearRegress could also be used in this case. In[5]:= FindFit[data, model[m2, i, t], {{m2, 4.5, 5.5}}, {i, t}] Out[5]= {m2 -> 5.00695} For a problem of the magnitude you described, the approach above will be very time and memory consuming. A better approach is to construct the error sum of squares and minimize it using FindMinimum. Here we construct a data set and model based on 100 sub models and sub data sets containing 200 points each. In[6]:= datafun2[y_, nn_, m1_, m2_] := Table[{y, t, g[nn, y, m1, m2, t]}, {t, 1, 200}] In[7]:= {modelfuns, data} = With[{datasets = Table[{g[nn = Random[Integer, {5, 10}], y, 1.3, m2, t], datafun2[y, nn, 1.3, 5 + Random[Real, {-.1, .1}]]}, {y, 100}]}, {datasets[[All, 1]], Flatten[datasets[[All, 2]], 1]}]; In[8]:= model2[m2_, i_, t_] = Piecewise[ Transpose[{modelfuns, Thread[i == Range[Length[modelfuns]] + 1]}]]; The following constructs the sum of squared errors, and shows the optimum value for m2 obtaining by minimizing the sum of squared errors. In[9]:= ssq = Total[Map[(#[[-1]] - model2[m2, #[[1]], #[[2]]]) &, data]^2]; In[10]:= FindMinimum[ssq, {m2, 5}] // Timing Out[10]= {12.187 Second, {914.043, {m2 -> 4.87829}}} Darren Glosemeyer Wolfram Research At 06:09 AM 3/31/2006 -0500, dantimatter wrote: >Hello all, > >I have a question about using the nonlinear regression function on a >large data set. Perhaps some of you have suggestions, and can point me >in another direction if this is not the best way to solve this problem. > > >Basically I have ~100 data sets of ~200 points each, and I'd like to >fit each set to the following function: > >G(t) = 1/N * [ y / (1+t/m1) + (1-y) / (1+t/m2) ] > >For each data set, the numbers N and y are different, but the numbers >m1 and m2 are the same for all data sets. The problem is that I only >know m1, and not m2. I am hoping to simultaneously solve all these >data sets to come up with a value for m2, but I'm not entirely sure how >to code it. I can come up with a reasonable m2 to start any >regression. > >Any thoughts? > >Thanks!