MathGroup Archive 2004

[Date Index] [Thread Index] [Author Index]

Search the Archive

Multiple Regression

  • To: mathgroup at smc.vnet.net
  • Subject: [mg50295] Multiple Regression
  • From: "Doug" <umdougmm at hotmail.com>
  • Date: Wed, 25 Aug 2004 03:36:05 -0400 (EDT)
  • Organization: The University of Manitoba
  • Sender: owner-wri-mathgroup at wolfram.com

This is actually meant a leaf in the thread I started circa 3 hours ago (for
some reason my posts don't post until circa 12 hrs after sending them?)

What I have found is the following:  to quickly re-state my dilemma, I'm
trying to come up with a model of the following sort:
y = (B0) + (B1)*(o01) + (B2)*(o02) + (B3)*(o03) + (B4)*(o04) + (B5)*(u) +
(B6)*(v) + (B7)*(w) + (B8)*(z) + (B9)*(y) + (B10)*(x) + epsilon
and I'm of course trying to minimize epsilon.  Also, the other important
point is that o01 to o04 are binary and can be 1 only exclusively (ie, if a
data row has o01=1 then o02=..=o04=0, and the same thing goes for the other
o02 to o04)

The variables which are causing me much headache are the o01..o04, because
if I include all the ~50 000 rows of data and run the Fit[] function as
follows:
Fit[Data,{1,o01,o02,o03,o04,u,v,w,z,y,x},{o01,o02,o03,o04,u,v,w,z,y,x}]

the approximated B1=..=B4 are all equal and VERY large.  So I tried to
reverse engineer this problem so as to figure out what is wrong with it.

ANALYSIS 1
I added 10 rows where o01=1, 10 rows where o02=1, ..., 10 rows where o04=1.
Running the exactly same fit command on this data of 40 entries returns B1,
B2, B3, B4 that are all different and acceptably small.  However, once I add
more variables (~6000), the problem I describe above re-appears.

ANALYSIS 2
I added all the rows where o01=o02=o03=1 and no rows where o04=1.  Now,
B1=..=B3=impossibly large number not within the range of y (results).  B4 is
different but also very large.  However, as soon as I add one row where o04
= 1 I get B1=..=B4=unacceptably/impossibly large.

Any help is really greatly appreciated.  In particular if there's any
implicit assumptions which mathematica makes and I'm simply not aware of
when running Fit[] or Regress[].  I've done multiple regression with ~ 90
000 rows before and 16 variables, but when I added these four variables,
things started to go kaput like this.

Doug



  • Prev by Date: Re: Random rook's tour of a rectangle
  • Next by Date: Re: RE: Re: Comparison of Mathematica on Various Computers Reply-To: drbob@bigfoot.com
  • Previous by thread: Re: subscripted function variables
  • Next by thread: Re: Multiple Regression