MathGroup Archive 2009

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Determining a formula with Mathematica 6

Hi Dylan,

Looks like you're in for a standard data fit, although your phrasing
is not really clear. You ask for a solution that maximizes
correlation. However, you don't specify what should be correlated with
what. I take it that you mean the correlation between the measured and
predicted values.

Usually, a data fit minimizes an error term which is the sum of the
squares of the differences between the predicted and measured values.
This is what Mathematica uses.

If you want to fit a linear combination of terms you can use Fit. If
you have a non-linear model you can try FindFit. If you want to look
closer to the fit's statistics (and if you have version 7) you could
try LinearModelFit, GeneralizedLinearModelFit, NonlinearModelFit,
LogitModelFit, and ProbitModelFit.

What you are looking for "to create a formula, with the highest possible
correlation to the output variable, using the input variables"
actually doesn't make sense if you don't restrict this formula in some
way or another. This is so because it is always possible to find a
function (perhaps with many, many coefficients and terms) that will
fit your data perfectly.

Therefore, I feel (but I may be mistaken here) that you are looking
for a fit that simply consists of a linear combination of your 50
input values. Let's call them x1, x2, ..., x49 and x50. To lower the
degree of clutter in the following example I'll assume we have 4
inputs. You'll be able to generalize this.

In[35]:= Fit[{{1, 1, 1, 1, 4}, {1, 2, 3, 4, 10}, {2, 1, 2, 1, 6}},
{x1, x2, x3, x4}, {x1, x2, x3, x4}]

Out[35]= 0.909091 x1 + 1.09091 x2 + 1.09091 x3 + 0.909091 x4

Here you see a fit to a set of three measurements of an output
dependent on 4 inputs. The first four numbers in each sublist are the
inputs and the last one in each is the output. The next argument of
Fit states that I want the fitted function to be in terms of simply
x1, x2, x3, and x4. The last argument names the variables, which in
this case equals the fit functions themselves. The fit functions could
have been any set of functions of x1, x2, x3, and x4 you can think of.

Actually, I generated the outputs (4, 10 and 6) by just summing the
inputs. In that case you might expect the fit to be x1+x2+x3+x4, but
it isn't. How come? Well, the number of data points is less than the
number of variables. In this case, the problem is called
"underdetermined". There is an infinite amount of solutions to the
fitting problem. Try filling in the inputs of the example in the
function given by Fit and you see that the fit is perfect.

In your case, with 50 inputs, it would be good if you had at least 50
observations with different inputs in order to avoid this
underdetermination issue. You can try your fit nevertheless, but you
can be sure that the solution is not unique.

Cheers -- Sjoerd

On Apr 19, 10:52 am, Dylan Bradbury <dylanbradb... at> wrote:
> Hello,
> With Mathematica 6, can you help me do the following?:
> *
> I have multiple sets of about fifty input variables, each set has its own
> output variable. This is all in an excel spreadsheet.
> I need Mathematica 6 to create a formula, with the highest possible
> correlation to the output variable, using the input variables.
> *Thanks,
> Dylan

  • Prev by Date: Re: Passing arrays to MathLink without extra memory allocation
  • Next by Date: Re: Looking for more Mathematica online user groups/forums
  • Previous by thread: Re: Determining a formula with Mathematica 6
  • Next by thread: Load .m file from a web server