Multiple Regression

*To*: mathgroup at smc.vnet.net*Subject*: [mg50295] Multiple Regression*From*: "Doug" <umdougmm at hotmail.com>*Date*: Wed, 25 Aug 2004 03:36:05 -0400 (EDT)*Organization*: The University of Manitoba*Sender*: owner-wri-mathgroup at wolfram.com

This is actually meant a leaf in the thread I started circa 3 hours ago (for some reason my posts don't post until circa 12 hrs after sending them?) What I have found is the following: to quickly re-state my dilemma, I'm trying to come up with a model of the following sort: y = (B0) + (B1)*(o01) + (B2)*(o02) + (B3)*(o03) + (B4)*(o04) + (B5)*(u) + (B6)*(v) + (B7)*(w) + (B8)*(z) + (B9)*(y) + (B10)*(x) + epsilon and I'm of course trying to minimize epsilon. Also, the other important point is that o01 to o04 are binary and can be 1 only exclusively (ie, if a data row has o01=1 then o02=..=o04=0, and the same thing goes for the other o02 to o04) The variables which are causing me much headache are the o01..o04, because if I include all the ~50 000 rows of data and run the Fit[] function as follows: Fit[Data,{1,o01,o02,o03,o04,u,v,w,z,y,x},{o01,o02,o03,o04,u,v,w,z,y,x}] the approximated B1=..=B4 are all equal and VERY large. So I tried to reverse engineer this problem so as to figure out what is wrong with it. ANALYSIS 1 I added 10 rows where o01=1, 10 rows where o02=1, ..., 10 rows where o04=1. Running the exactly same fit command on this data of 40 entries returns B1, B2, B3, B4 that are all different and acceptably small. However, once I add more variables (~6000), the problem I describe above re-appears. ANALYSIS 2 I added all the rows where o01=o02=o03=1 and no rows where o04=1. Now, B1=..=B3=impossibly large number not within the range of y (results). B4 is different but also very large. However, as soon as I add one row where o04 = 1 I get B1=..=B4=unacceptably/impossibly large. Any help is really greatly appreciated. In particular if there's any implicit assumptions which mathematica makes and I'm simply not aware of when running Fit[] or Regress[]. I've done multiple regression with ~ 90 000 rows before and 16 variables, but when I added these four variables, things started to go kaput like this. Doug

**Follow-Ups**:**Re: Multiple Regression***From:*"Janos D. Pinter" <jdpinter@hfx.eastlink.ca>