MathGroup Archive: August 2004 [00505]

[Date Index] [Thread Index] [Author Index]

Re: Multiple Regression

To: mathgroup at smc.vnet.net
Subject: [mg50301] Re: [mg50295] Multiple Regression
From: "Janos D. Pinter" <jdpinter at hfx.eastlink.ca>
Date: Thu, 26 Aug 2004 06:50:43 -0400 (EDT)
References: <200408250736.DAA19807@smc.vnet.net>
Sender: owner-wri-mathgroup at wolfram.com

Doug,

your regression model can be formulated as a numerical optimization 
problem. Place reasonably tight bounds on each decision variable, and 
express the binary conditions as constraints. Then you can use e.g. 
NMInimize or the [my] MathOptimizer /Pro packages.

Regards,
Janos
_________________________________________________

Janos D. Pinter, PhD, DSc
President & Research Scientist, PCS Inc.
Adjunct Professor, Dalhousie University

129 Glenforest Drive, Halifax, NS, Canada B3M 1J2
Telephone: +1-(902)-443-5910
Fax: +1-(902)-431-5100; +1-(902)-443-5910
E-mail: jdpinter at hfx.eastlink.ca
Web: www.pinterconsulting.com     www.dal.ca/~jdpinter
Software products: http://www.pinterconsulting.com/Software_Sum_Info.pdf



At 04:36 AM 8/25/2004, you wrote:
>This is actually meant a leaf in the thread I started circa 3 hours ago (for
>some reason my posts don't post until circa 12 hrs after sending them?)
>
>What I have found is the following:  to quickly re-state my dilemma, I'm
>trying to come up with a model of the following sort:
>y = (B0) + (B1)*(o01) + (B2)*(o02) + (B3)*(o03) + (B4)*(o04) + (B5)*(u) +
>(B6)*(v) + (B7)*(w) + (B8)*(z) + (B9)*(y) + (B10)*(x) + epsilon
>and I'm of course trying to minimize epsilon.  Also, the other important
>point is that o01 to o04 are binary and can be 1 only exclusively (ie, if a
>data row has o01=1 then o02=..=o04=0, and the same thing goes for the other
>o02 to o04)
>
>The variables which are causing me much headache are the o01..o04, because
>if I include all the ~50 000 rows of data and run the Fit[] function as
>follows:
>Fit[Data,{1,o01,o02,o03,o04,u,v,w,z,y,x},{o01,o02,o03,o04,u,v,w,z,y,x}]
>
>the approximated B1=..=B4 are all equal and VERY large.  So I tried to
>reverse engineer this problem so as to figure out what is wrong with it.
>
>ANALYSIS 1
>I added 10 rows where o01=1, 10 rows where o02=1, ..., 10 rows where o04=1.
>Running the exactly same fit command on this data of 40 entries returns B1,
>B2, B3, B4 that are all different and acceptably small.  However, once I add
>more variables (~6000), the problem I describe above re-appears.
>
>ANALYSIS 2
>I added all the rows where o01=o02=o03=1 and no rows where o04=1.  Now,
>B1=..=B3=impossibly large number not within the range of y (results).  B4 is
>different but also very large.  However, as soon as I add one row where o04
>= 1 I get B1=..=B4=unacceptably/impossibly large.
>
>Any help is really greatly appreciated.  In particular if there's any
>implicit assumptions which mathematica makes and I'm simply not aware of
>when running Fit[] or Regress[].  I've done multiple regression with ~ 90
>000 rows before and 16 variables, but when I added these four variables,
>things started to go kaput like this.
>
>Doug

References:
- Multiple Regression
  - From: "Doug" <umdougmm@hotmail.com>

Prev by Date: Re: Technical Publishing Made Easy with New Wolfram Publicon Software

Next by Date: Re: Re: Re: Re: Re: Re: FindMinimum and the minimum-radius circle

Previous by thread: Multiple Regression

Next by thread: Re: Multiple Regression