Re: Multiple Regression

*To*: mathgroup at smc.vnet.net*Subject*: [mg50301] Re: [mg50295] Multiple Regression*From*: "Janos D. Pinter" <jdpinter at hfx.eastlink.ca>*Date*: Thu, 26 Aug 2004 06:50:43 -0400 (EDT)*References*: <200408250736.DAA19807@smc.vnet.net>*Sender*: owner-wri-mathgroup at wolfram.com

Doug, your regression model can be formulated as a numerical optimization problem. Place reasonably tight bounds on each decision variable, and express the binary conditions as constraints. Then you can use e.g. NMInimize or the [my] MathOptimizer /Pro packages. Regards, Janos _________________________________________________ Janos D. Pinter, PhD, DSc President & Research Scientist, PCS Inc. Adjunct Professor, Dalhousie University 129 Glenforest Drive, Halifax, NS, Canada B3M 1J2 Telephone: +1-(902)-443-5910 Fax: +1-(902)-431-5100; +1-(902)-443-5910 E-mail: jdpinter at hfx.eastlink.ca Web: www.pinterconsulting.com www.dal.ca/~jdpinter Software products: http://www.pinterconsulting.com/Software_Sum_Info.pdf At 04:36 AM 8/25/2004, you wrote: >This is actually meant a leaf in the thread I started circa 3 hours ago (for >some reason my posts don't post until circa 12 hrs after sending them?) > >What I have found is the following: to quickly re-state my dilemma, I'm >trying to come up with a model of the following sort: >y = (B0) + (B1)*(o01) + (B2)*(o02) + (B3)*(o03) + (B4)*(o04) + (B5)*(u) + >(B6)*(v) + (B7)*(w) + (B8)*(z) + (B9)*(y) + (B10)*(x) + epsilon >and I'm of course trying to minimize epsilon. Also, the other important >point is that o01 to o04 are binary and can be 1 only exclusively (ie, if a >data row has o01=1 then o02=..=o04=0, and the same thing goes for the other >o02 to o04) > >The variables which are causing me much headache are the o01..o04, because >if I include all the ~50 000 rows of data and run the Fit[] function as >follows: >Fit[Data,{1,o01,o02,o03,o04,u,v,w,z,y,x},{o01,o02,o03,o04,u,v,w,z,y,x}] > >the approximated B1=..=B4 are all equal and VERY large. So I tried to >reverse engineer this problem so as to figure out what is wrong with it. > >ANALYSIS 1 >I added 10 rows where o01=1, 10 rows where o02=1, ..., 10 rows where o04=1. >Running the exactly same fit command on this data of 40 entries returns B1, >B2, B3, B4 that are all different and acceptably small. However, once I add >more variables (~6000), the problem I describe above re-appears. > >ANALYSIS 2 >I added all the rows where o01=o02=o03=1 and no rows where o04=1. Now, >B1=..=B3=impossibly large number not within the range of y (results). B4 is >different but also very large. However, as soon as I add one row where o04 >= 1 I get B1=..=B4=unacceptably/impossibly large. > >Any help is really greatly appreciated. In particular if there's any >implicit assumptions which mathematica makes and I'm simply not aware of >when running Fit[] or Regress[]. I've done multiple regression with ~ 90 >000 rows before and 16 variables, but when I added these four variables, >things started to go kaput like this. > >Doug

**References**:**Multiple Regression***From:*"Doug" <umdougmm@hotmail.com>