       Re: Efficient creation of regression design matrix

• To: mathgroup at smc.vnet.net
• Subject: [mg82305] Re: [mg82235] Efficient creation of regression design matrix
• From: Darren Glosemeyer <darreng at wolfram.com>
• Date: Wed, 17 Oct 2007 04:03:43 -0400 (EDT)
• References: <200710160720.DAA08572@smc.vnet.net>

```Coleman, Mark wrote:
> Hi,
>
> I'm searching for an efficient bit of code to create a design matrix of
> 1's and 0's computed from  categorical (non-numeric) variables, suitable
> for use in regression problems. More precisely, imagine one has an n x 1
> vector of k different non-numeric values. For argument sakes, let
> k={Red,Blue,Green,Yellow}. I would like to create an n x k matrix
> consisting of 1's and 0's, where a '1' appears in the row and column
> location corresponding to the presence of an element of k. For example,
> say the original data is
>
> Red
> Blue
> Blue
> Yellow
> Red
> Green
>

If you know the possible categories, you can use the following, which
takes the categories as the second argument.

In:= categoryDesign[xx_, vals_] :=

In:=
categoryDesign[{Red,Blue,Blue,Yellow,Red,Green},{Red,Blue,Green,Yellow}]

Out= {{1, 0, 0, 0}, {0, 1, 0, 0}, {0, 1, 0, 0}, {0, 0, 0, 1}, {1, 0,
0, 0},

>    {0, 0, 1, 0}}

If the possible categories are not known, the following can be used.

In:= categoryDesign[xx_] :=
Block[{vals = Union[xx]},

Note that the categories using this definition are coded in Sort order
because of the Union.

In:= categoryDesign[{Red,Blue,Blue,Yellow,Red,Green}]

Out= {{0, 0, 1, 0}, {1, 0, 0, 0}, {1, 0, 0, 0}, {0, 0, 0, 1}, {0, 0,
1, 0},

>    {0, 1, 0, 0}}

In terms of efficiency, the first definition takes about a third of a
second for a million values on my machine.

In:= vals = RandomChoice[{Red, Blue, Green, Yellow}, 10^6];

In:= categoryDesign[vals,{Red,Blue,Green,Yellow}];//Timing

Out= {0.344, Null}

The second definition will be slower by the amount of time needed by Union.

Darren Glosemeyer
Wolfram Research

```

• Prev by Date: Re: Implicit plotting issues
• Next by Date: Re: Integrate question
• Previous by thread: Re: Efficient creation of regression design matrix
• Next by thread: Re: Efficient creation of regression design matrix