MathGroup Archive: April 2005 [00036]

[Date Index] [Thread Index] [Author Index]

Re: Multiple regression best subset

To: mathgroup at smc.vnet.net
Subject: [mg55672] Re: Multiple regression best subset
From: "Ray Koopman" <koopman at sfu.ca>
Date: Sat, 2 Apr 2005 01:28:02 -0500 (EST)
References: <d2dobj$ljj$1@smc.vnet.net><200503310625.BAA15246@smc.vnet.net> <d2j8ov$lc$1@smc.vnet.net>
Sender: owner-wri-mathgroup at wolfram.com

Ian Roberts wrote:
> I check every possible subset. I presume XLMiner does the same as
> it describes the option as Exhaustive Search and does offer other
> options which are faster but not guaranteed to find the "best".
> I get the same results as they do but it's much much slower.

Here's some code that checks all the subsets. For each subset, it
saves 1 - AdjustedR^2 and an integer whose binary representation
identifies the predictors. There are undoubtedly more efficient ways
to do this, but it's unlikely they will be much simpler.

p = (* number of predictors *);
n1 = (* sample size - 1 *);
rxx = (* p x p matrix of correlations among the predictors *);
rxy = (* p-vector of correlations of the predictors with the d.v. *);

u = Sort@Table[i = Flatten@Position[IntegerDigits[j,2,p],1]; {(n1/
(n1-Length@i))*(1. - rxy[[i]].LinearSolve[rxx[[i,i]],rxy[[i]]]), j},
{j,2^p-1}];

To see the results for only the subsets with k predictors, look at

v = Select[u, Tr@IntegerDigits[#[[2]],2] == k &];

Prev by Date: Re: How can I simplify a series with some multiplicative factor?

Next by Date: webMathematica-based on-line learning system?

Previous by thread: Re: Re: Multiple regression best subset

Next by thread: GraphPlot vs. SpringEmbedding