Re: Re: Optimal variable selection in multiple linear regression ?
- To: mathgroup at smc.vnet.net
- Subject: [mg74968] Re: [mg74948] Re: Optimal variable selection in multiple linear regression ?
- From: "Richard Palmer" <rhpalmer at gmail.com>
- Date: Fri, 13 Apr 2007 02:00:48 -0400 (EDT)
- References: <firstname.lastname@example.org> <200704120853.EAA25046@smc.vnet.net>
Do you have any theory guiding you here? If you do, the theory will
tell what variables to include in the model. For example, theory says
that sales should be effected by real price, prices of competitive
products, and income.
If you have no theory, you are data mining. I expect that if you have
more than 100 IVs you don't have enough observations to estimate any
sort of a model using all the IVs anyway (I believe Mathematica estimates
regression models by requiring all the data be in memory). Try a
variable reduction technique (see a textbook) to get the number down
to something managable.
On 4/12/07, fkampas <fkampas at verizon.net> wrote:
> Looking at the P values that Regress gives you would be a start to finding
> out which are significant.
> "Alex" <axel.kowald at rub.de> wrote in message
> news:ev4vla$8pn$1 at smc.vnet.net...
> > Hello everybody,
> > I try to use Mathematica to do a multiple linear regression. I'm using
> > Regress and in principle everything works. However, because I have a
> > large number of independent variables (>100), I would like to select
> > only the most significant for the regression model.
> > Is there already an algorithm implemented in Mathematica for selecting
> > the best variables, or how else would I do it?
> > Many thanks,
> > Alex
Cell 508 982-7266
Prev by Date:
Re: Beginner--Help on Listing Values
Next by Date:
Re: Variable containing code
Previous by thread:
Re: Optimal variable selection in multiple linear regression ?
Next by thread:
Problem using NDSolve