Bivariate Normal Distributions -- can they be estimated in my lifetime?

*To*: mathgroup at smc.vnet.net*Subject*: [mg14705] Bivariate Normal Distributions -- can they be estimated in my lifetime?*From*: "ELLIS, Luci (FS)" <EllisL at rba.gov.au>*Date*: Tue, 10 Nov 1998 01:20:58 -0500*Sender*: owner-wri-mathgroup at wolfram.com

Dear MathGroup, I have given up on several weeks of work trying to do maximum likelihood estimation of a bivariate probit model (ie two binary decisions that are correlated). The likelihood function itself is easy to construct, and in principle should be easy to maximise (using FindMinimum[] on its negative) because the PDF of the bivariate normal distribution is single-peaked. However, it appears impossible to do this on a desktop computer. I am fairly certain my code is correct, but it refuses to produce an answer on even a toy problem (20 observations and 2 independent variables and therefore 7 parameters to be estimated) no matter how long I give it. Since I want to use this procedure on a 53,000 observation dataset with about 20-30 independent variables, it seems unlikely that I will ever get this to work. I have encountered several problems 1. CDF[MultinormalDistribution[{0,0},{{1,rho},{rho,1}}],{x,y}] generates very small imaginary parts when rho is negative. The definition of the likelihood function for this model guarantees at least some of its elements will have negative covariances. Sure, I can wrap the whole thing in Chop[], but frankly I find it peculiar that a density function should do this. 2. The CDF of the multinormal takes too darn long to evaluate. Ok, I am using a PII 266 not a Unix box or a PowerMac G3, but at work you have to take what you're given. Nonetheless, if individual realisations of the CDF take 2-5 seconds to evaluate, the likelihood function must be taking multiple minutes to evaluate even once, and would probably take hours or days with the full data set. I don't have that kind of time (this job finishes next Tuesday. I have been struggling with this problem for about two months). 3. The CDF of the multinormal takes even longer than usual to evaluate if rho is negative. I tried to make up for this by substituting the actual numerical double-integral of the PDF in those cases (which for some reason evaluates faster those cases, but slower than the CDF for positive rho). But this wasn't enough of a speed up and I still don't get an answer. Before anyone responds with suggestions to solve the first-order conditions instead of maximise the likelihood, yes, the first-order conditions are analytic (see WH Greene "Econometric Analysis" page 661) but they take even longer to evaluate because they involve PDFs divided by CDFs and other nonsense. Chris Farr has previously suggested using a discrete approximation of the CDF, but I am concerned that the approximation error would get too large, making my results unreliable. I notice that Wolfram has recently put a "Mathematica and Statistics" part on its web site (the pallette is cool, by the way!). But in my experience, much as I love this package, it's just not ready for prime-time professional econometrics / statistics. Gauss and Stata can both do this -- why can't Mathematica? Perhaps it's time to roll some of the statistical packages into the kernel or otherwise speed them up? I accept that a general package like Mathematica is going to be slower than a specialised package like Stata or Shazam -- but the margin needs to be smaller if Mathematica is to be usable in this context. This post is probably more in the "rant" category than the usual pleas for assistance. But if there is anyone out there with any ideas on how I should proceed from here, I'd be grateful! Thanks, Luci ________________________________________________________ Luci Ellis ph: 61-2-9551-8786 Manager, Research fax:61-2-9551-8024 Payments System Efficiency em: ellisl at rba.gov.au Payments Policy Department GPO Box 3947 Reserve Bank of Australia Sydney 2001 Australia