[Date Index]
[Thread Index]
[Author Index]
Multiple Regression
*To*: mathgroup at smc.vnet.net
*Subject*: [mg50295] Multiple Regression
*From*: "Doug" <umdougmm at hotmail.com>
*Date*: Wed, 25 Aug 2004 03:36:05 -0400 (EDT)
*Organization*: The University of Manitoba
*Sender*: owner-wri-mathgroup at wolfram.com
This is actually meant a leaf in the thread I started circa 3 hours ago (for
some reason my posts don't post until circa 12 hrs after sending them?)
What I have found is the following: to quickly re-state my dilemma, I'm
trying to come up with a model of the following sort:
y = (B0) + (B1)*(o01) + (B2)*(o02) + (B3)*(o03) + (B4)*(o04) + (B5)*(u) +
(B6)*(v) + (B7)*(w) + (B8)*(z) + (B9)*(y) + (B10)*(x) + epsilon
and I'm of course trying to minimize epsilon. Also, the other important
point is that o01 to o04 are binary and can be 1 only exclusively (ie, if a
data row has o01=1 then o02=..=o04=0, and the same thing goes for the other
o02 to o04)
The variables which are causing me much headache are the o01..o04, because
if I include all the ~50 000 rows of data and run the Fit[] function as
follows:
Fit[Data,{1,o01,o02,o03,o04,u,v,w,z,y,x},{o01,o02,o03,o04,u,v,w,z,y,x}]
the approximated B1=..=B4 are all equal and VERY large. So I tried to
reverse engineer this problem so as to figure out what is wrong with it.
ANALYSIS 1
I added 10 rows where o01=1, 10 rows where o02=1, ..., 10 rows where o04=1.
Running the exactly same fit command on this data of 40 entries returns B1,
B2, B3, B4 that are all different and acceptably small. However, once I add
more variables (~6000), the problem I describe above re-appears.
ANALYSIS 2
I added all the rows where o01=o02=o03=1 and no rows where o04=1. Now,
B1=..=B3=impossibly large number not within the range of y (results). B4 is
different but also very large. However, as soon as I add one row where o04
= 1 I get B1=..=B4=unacceptably/impossibly large.
Any help is really greatly appreciated. In particular if there's any
implicit assumptions which mathematica makes and I'm simply not aware of
when running Fit[] or Regress[]. I've done multiple regression with ~ 90
000 rows before and 16 variables, but when I added these four variables,
things started to go kaput like this.
Doug
Prev by Date:
**Re: Random rook's tour of a rectangle**
Next by Date:
**Re: RE: Re: Comparison of Mathematica on Various Computers Reply-To: drbob@bigfoot.com**
Previous by thread:
**Re: subscripted function variables**
Next by thread:
**Re: Multiple Regression**
| |