[Date Index]
[Thread Index]
[Author Index]
Re: Efficient creation of regression design matrix
*To*: mathgroup at smc.vnet.net
*Subject*: [mg82320] Re: Efficient creation of regression design matrix
*From*: Ray Koopman <koopman at sfu.ca>
*Date*: Wed, 17 Oct 2007 04:11:32 -0400 (EDT)
*References*: <ff1ovf$8g5$1@smc.vnet.net>
On Oct 16, 12:24 am, "Coleman, Mark" <Mark.Cole... at LibertyMutual.com>
wrote:
> Hi,
>
> I'm searching for an efficient bit of code to create a design matrix of
> 1's and 0's computed from categorical (non-numeric) variables, suitable
> for use in regression problems. More precisely, imagine one has an n x 1
> vector of k different non-numeric values. For argument sakes, let
> k={Red,Blue,Green,Yellow}. I would like to create an n x k matrix
> consisting of 1's and 0's, where a '1' appears in the row and column
> location corresponding to the presence of an element of k. For example,
> say the original data is
>
> Red
> Blue
> Blue
> Yellow
> Red
> Green
> .
> .
> .
>
> Then the corresponding design matrix would be (assuming we use the same
> ordering of k):
>
> Original Red Blue Green Yellow
> ====== ==============================
> Red 1 0 0 0
> Blue 0 1 0 0
> Blue 0 1 0 0
> Yellow 0 0 0 1
> Red 1 0 0 0
> Green 0 0 1 0
>
> And so on. I have some code that does this, but as is the norm, I'm sure
> there are some great Mathematica one-liners that do a better job. In applied
> problems that I work with, n can be up to 100,000 and k = 30
>
> Thanks,
>
> -Mark
If v is a list of values of variables, such as {red, blue, blue,
yellow, red, green, ...}, and u is a list of the possible values in v,
such as {red, blue, green, yellow}, then probably the simplest way to
get what you asked for is
x = Boole@Outer[SameQ, v, u] .
A slightly more complicated, but much faster, way is
x = v /. Thread[u -> IdentityMatrix@Length@u] .
Prev by Date:
**Re: Re: format mixed integers & floats with text styling**
Next by Date:
**Re: ProgressIndicator Questions**
Previous by thread:
**Re: Efficient creation of regression design matrix**
Next by thread:
**Mathematica Won't Activate**
| |