       Re: simultaneous nonlinear regression of a lot of data

• To: mathgroup at smc.vnet.net
• Subject: [mg65441] Re: [mg65427] simultaneous nonlinear regression of a lot of data
• From: Darren Glosemeyer <darreng at wolfram.com>
• Date: Sat, 1 Apr 2006 05:39:01 -0500 (EST)
• Sender: owner-wri-mathgroup at wolfram.com

```One possibility is to construct the model as a Piecewise function using an
index variable to determine which model to use for each data point.  As a
shortened example, here is some data and a model comprised of 5 sub data
sets and models.

In:= g[nn_, y_, m1_, m2_, t_] = 1/nn(y / (1 + t/m1) + (1 - y) / (1 +
t/m2) );

In:= datafun[y_, nn_, m1_, m2_] := Table[{y, t, g[nn, y, m1, m2, t]},
{t, 1, 6}]

In:= {modelfuns, data} =
With[{datasets =
Table[{g[nn = Random[Integer, {5, 10}], y, 1.3, m2, t],
datafun[y, nn, 1.3, 5 + Random[Real, {-.1, .1}]]},
{y, 5}]},
{datasets[[All, 1]], Flatten[datasets[[All, 2]], 1]}];

This is the model we will fit.

In:= InputForm[
model[m2_, i_, t_] =
Piecewise[Transpose[{modelfuns, Thread[i ==
Range[Length[modelfuns]]]}]]]

Out//InputForm=
Piecewise[{{1/(5*(1 + 0.7692307692307692*t)), i == 1},
{(2/(1 + 0.7692307692307692*t) - (1 + t/m2)^(-1))/6, i == 2},
{(3/(1 + 0.7692307692307692*t) - 2/(1 + t/m2))/9, i == 3},
{(4/(1 + 0.7692307692307692*t) - 3/(1 + t/m2))/7, i == 4},
{(5/(1 + 0.7692307692307692*t) - 4/(1 + t/m2))/10, i == 5}}, 0]

For a problem of this size, FindFit will work just fine for computing
m2.  NonlinearRegress could also be used in this case.

In:= FindFit[data, model[m2, i, t], {{m2, 4.5, 5.5}}, {i, t}]

Out= {m2 -> 5.00695}

For a problem of the magnitude you described, the approach above will be
very time and memory consuming.  A better approach is to construct the
error sum of squares and minimize it using FindMinimum.  Here we construct
a data set and model based on 100 sub models and sub data sets containing
200 points each.

In:= datafun2[y_, nn_, m1_, m2_] :=
Table[{y, t, g[nn, y, m1, m2, t]}, {t, 1, 200}]

In:= {modelfuns, data} =
With[{datasets =
Table[{g[nn = Random[Integer, {5, 10}], y, 1.3, m2, t],
datafun2[y, nn, 1.3, 5 + Random[Real, {-.1, .1}]]},
{y, 100}]},
{datasets[[All, 1]], Flatten[datasets[[All, 2]], 1]}];

In:= model2[m2_, i_, t_] =
Piecewise[
Transpose[{modelfuns, Thread[i == Range[Length[modelfuns]] +
1]}]];

The following constructs the sum of squared errors, and shows the optimum
value for m2 obtaining by minimizing the sum of squared errors.

In:= ssq = Total[Map[(#[[-1]] - model2[m2, #[], #[]]) &, data]^2];

In:= FindMinimum[ssq, {m2, 5}] // Timing

Out= {12.187 Second, {914.043, {m2 -> 4.87829}}}

Darren Glosemeyer
Wolfram Research

At 06:09 AM 3/31/2006 -0500, dantimatter wrote:
>Hello all,
>
>I have a question about using the nonlinear regression function on a
>large data set.  Perhaps some of you have suggestions, and can point me
>in another direction if this is not the best way to solve this problem.
>
>
>Basically I have ~100 data sets of ~200 points each, and I'd like to
>fit each set to the following function:
>
>G(t) = 1/N * [ y / (1+t/m1) + (1-y) / (1+t/m2) ]
>
>For each data set, the numbers N and y are different, but the numbers
>m1 and m2 are the same for all data sets.  The problem is that I only
>know m1, and not m2.  I am hoping to simultaneously solve all these
>data sets to come up with a value for m2, but I'm not entirely sure how
>to code it.  I can come up with a reasonable m2 to start any
>regression.
>
>Any thoughts?
>
>Thanks!

```

• Prev by Date: Re: A test with options
• Next by Date: Re: BasicInput Palette - Restoring Original Size
• Previous by thread: Re: simultaneous nonlinear regression of a lot of data
• Next by thread: Re: simultaneous nonlinear regression of a lot of data