Re: Determining a formula with Mathematica 6

*To*: mathgroup at smc.vnet.net*Subject*: [mg98806] Re: Determining a formula with Mathematica 6*From*: "Sjoerd C. de Vries" <sjoerd.c.devries at gmail.com>*Date*: Mon, 20 Apr 2009 01:28:48 -0400 (EDT)*References*: <gseooq$ec5$1@smc.vnet.net>

Hi Dylan, Looks like you're in for a standard data fit, although your phrasing is not really clear. You ask for a solution that maximizes correlation. However, you don't specify what should be correlated with what. I take it that you mean the correlation between the measured and predicted values. Usually, a data fit minimizes an error term which is the sum of the squares of the differences between the predicted and measured values. This is what Mathematica uses. If you want to fit a linear combination of terms you can use Fit. If you have a non-linear model you can try FindFit. If you want to look closer to the fit's statistics (and if you have version 7) you could try LinearModelFit, GeneralizedLinearModelFit, NonlinearModelFit, LogitModelFit, and ProbitModelFit. What you are looking for "to create a formula, with the highest possible correlation to the output variable, using the input variables" actually doesn't make sense if you don't restrict this formula in some way or another. This is so because it is always possible to find a function (perhaps with many, many coefficients and terms) that will fit your data perfectly. Therefore, I feel (but I may be mistaken here) that you are looking for a fit that simply consists of a linear combination of your 50 input values. Let's call them x1, x2, ..., x49 and x50. To lower the degree of clutter in the following example I'll assume we have 4 inputs. You'll be able to generalize this. In[35]:= Fit[{{1, 1, 1, 1, 4}, {1, 2, 3, 4, 10}, {2, 1, 2, 1, 6}}, {x1, x2, x3, x4}, {x1, x2, x3, x4}] Out[35]= 0.909091 x1 + 1.09091 x2 + 1.09091 x3 + 0.909091 x4 Here you see a fit to a set of three measurements of an output dependent on 4 inputs. The first four numbers in each sublist are the inputs and the last one in each is the output. The next argument of Fit states that I want the fitted function to be in terms of simply x1, x2, x3, and x4. The last argument names the variables, which in this case equals the fit functions themselves. The fit functions could have been any set of functions of x1, x2, x3, and x4 you can think of. Actually, I generated the outputs (4, 10 and 6) by just summing the inputs. In that case you might expect the fit to be x1+x2+x3+x4, but it isn't. How come? Well, the number of data points is less than the number of variables. In this case, the problem is called "underdetermined". There is an infinite amount of solutions to the fitting problem. Try filling in the inputs of the example in the function given by Fit and you see that the fit is perfect. In your case, with 50 inputs, it would be good if you had at least 50 observations with different inputs in order to avoid this underdetermination issue. You can try your fit nevertheless, but you can be sure that the solution is not unique. Cheers -- Sjoerd On Apr 19, 10:52 am, Dylan Bradbury <dylanbradb... at gmail.com> wrote: > Hello, > > With Mathematica 6, can you help me do the following?: > * > I have multiple sets of about fifty input variables, each set has its own > output variable. This is all in an excel spreadsheet. > > I need Mathematica 6 to create a formula, with the highest possible > correlation to the output variable, using the input variables. > > *Thanks, > Dylan