Re: simultaneous nonlinear regression of a lot of data
- To: mathgroup at smc.vnet.net
- Subject: [mg65441] Re: [mg65427] simultaneous nonlinear regression of a lot of data
- From: Darren Glosemeyer <darreng at wolfram.com>
- Date: Sat, 1 Apr 2006 05:39:01 -0500 (EST)
- Sender: owner-wri-mathgroup at wolfram.com
One possibility is to construct the model as a Piecewise function using an
index variable to determine which model to use for each data point. As a
shortened example, here is some data and a model comprised of 5 sub data
sets and models.
In[1]:= g[nn_, y_, m1_, m2_, t_] = 1/nn(y / (1 + t/m1) + (1 - y) / (1 +
t/m2) );
In[2]:= datafun[y_, nn_, m1_, m2_] := Table[{y, t, g[nn, y, m1, m2, t]},
{t, 1, 6}]
In[3]:= {modelfuns, data} =
With[{datasets =
Table[{g[nn = Random[Integer, {5, 10}], y, 1.3, m2, t],
datafun[y, nn, 1.3, 5 + Random[Real, {-.1, .1}]]},
{y, 5}]},
{datasets[[All, 1]], Flatten[datasets[[All, 2]], 1]}];
This is the model we will fit.
In[4]:= InputForm[
model[m2_, i_, t_] =
Piecewise[Transpose[{modelfuns, Thread[i ==
Range[Length[modelfuns]]]}]]]
Out[4]//InputForm=
Piecewise[{{1/(5*(1 + 0.7692307692307692*t)), i == 1},
{(2/(1 + 0.7692307692307692*t) - (1 + t/m2)^(-1))/6, i == 2},
{(3/(1 + 0.7692307692307692*t) - 2/(1 + t/m2))/9, i == 3},
{(4/(1 + 0.7692307692307692*t) - 3/(1 + t/m2))/7, i == 4},
{(5/(1 + 0.7692307692307692*t) - 4/(1 + t/m2))/10, i == 5}}, 0]
For a problem of this size, FindFit will work just fine for computing
m2. NonlinearRegress could also be used in this case.
In[5]:= FindFit[data, model[m2, i, t], {{m2, 4.5, 5.5}}, {i, t}]
Out[5]= {m2 -> 5.00695}
For a problem of the magnitude you described, the approach above will be
very time and memory consuming. A better approach is to construct the
error sum of squares and minimize it using FindMinimum. Here we construct
a data set and model based on 100 sub models and sub data sets containing
200 points each.
In[6]:= datafun2[y_, nn_, m1_, m2_] :=
Table[{y, t, g[nn, y, m1, m2, t]}, {t, 1, 200}]
In[7]:= {modelfuns, data} =
With[{datasets =
Table[{g[nn = Random[Integer, {5, 10}], y, 1.3, m2, t],
datafun2[y, nn, 1.3, 5 + Random[Real, {-.1, .1}]]},
{y, 100}]},
{datasets[[All, 1]], Flatten[datasets[[All, 2]], 1]}];
In[8]:= model2[m2_, i_, t_] =
Piecewise[
Transpose[{modelfuns, Thread[i == Range[Length[modelfuns]] +
1]}]];
The following constructs the sum of squared errors, and shows the optimum
value for m2 obtaining by minimizing the sum of squared errors.
In[9]:= ssq = Total[Map[(#[[-1]] - model2[m2, #[[1]], #[[2]]]) &, data]^2];
In[10]:= FindMinimum[ssq, {m2, 5}] // Timing
Out[10]= {12.187 Second, {914.043, {m2 -> 4.87829}}}
Darren Glosemeyer
Wolfram Research
At 06:09 AM 3/31/2006 -0500, dantimatter wrote:
>Hello all,
>
>I have a question about using the nonlinear regression function on a
>large data set. Perhaps some of you have suggestions, and can point me
>in another direction if this is not the best way to solve this problem.
>
>
>Basically I have ~100 data sets of ~200 points each, and I'd like to
>fit each set to the following function:
>
>G(t) = 1/N * [ y / (1+t/m1) + (1-y) / (1+t/m2) ]
>
>For each data set, the numbers N and y are different, but the numbers
>m1 and m2 are the same for all data sets. The problem is that I only
>know m1, and not m2. I am hoping to simultaneously solve all these
>data sets to come up with a value for m2, but I'm not entirely sure how
>to code it. I can come up with a reasonable m2 to start any
>regression.
>
>Any thoughts?
>
>Thanks!