Re: Multiple regression best subset
- To: mathgroup at smc.vnet.net
- Subject: [mg55672] Re: Multiple regression best subset
- From: "Ray Koopman" <koopman at sfu.ca>
- Date: Sat, 2 Apr 2005 01:28:02 -0500 (EST)
- References: <d2dobj$ljj$1@smc.vnet.net><200503310625.BAA15246@smc.vnet.net> <d2j8ov$lc$1@smc.vnet.net>
- Sender: owner-wri-mathgroup at wolfram.com
Ian Roberts wrote:
> I check every possible subset. I presume XLMiner does the same as
> it describes the option as Exhaustive Search and does offer other
> options which are faster but not guaranteed to find the "best".
> I get the same results as they do but it's much much slower.
Here's some code that checks all the subsets. For each subset, it
saves 1 - AdjustedR^2 and an integer whose binary representation
identifies the predictors. There are undoubtedly more efficient ways
to do this, but it's unlikely they will be much simpler.
p = (* number of predictors *);
n1 = (* sample size - 1 *);
rxx = (* p x p matrix of correlations among the predictors *);
rxy = (* p-vector of correlations of the predictors with the d.v. *);
u = Sort@Table[i = Flatten@Position[IntegerDigits[j,2,p],1]; {(n1/
(n1-Length@i))*(1. - rxy[[i]].LinearSolve[rxx[[i,i]],rxy[[i]]]), j},
{j,2^p-1}];
To see the results for only the subsets with k predictors, look at
v = Select[u, Tr@IntegerDigits[#[[2]],2] == k &];