Bivariate Normal Distributions -- can they be estimated in my lifetime?
- To: mathgroup at smc.vnet.net
- Subject: [mg14705] Bivariate Normal Distributions -- can they be estimated in my lifetime?
- From: "ELLIS, Luci (FS)" <EllisL at rba.gov.au>
- Date: Tue, 10 Nov 1998 01:20:58 -0500
- Sender: owner-wri-mathgroup at wolfram.com
Dear MathGroup,
I have given up on several weeks of work trying to do maximum likelihood
estimation of a bivariate probit model (ie two binary decisions that
are correlated). The likelihood function itself is easy to construct,
and in principle should be easy to maximise (using FindMinimum[] on its
negative) because the PDF of the bivariate normal distribution is
single-peaked.
However, it appears impossible to do this on a desktop computer. I am
fairly certain my code is correct, but it refuses to produce an answer
on even a toy problem (20 observations and 2 independent variables and
therefore 7 parameters to be estimated) no matter how long I give it.
Since I want to use this procedure on a 53,000 observation dataset with
about 20-30 independent variables, it seems unlikely that I will ever
get this to work.
I have encountered several problems
1. CDF[MultinormalDistribution[{0,0},{{1,rho},{rho,1}}],{x,y}]
generates very small imaginary parts when rho is negative. The
definition of the likelihood function for this model guarantees at
least some of its elements will have negative covariances. Sure, I can
wrap the whole thing in Chop[], but frankly I find it peculiar that a
density function should do this.
2. The CDF of the multinormal takes too darn long to evaluate. Ok, I am
using a PII 266 not a Unix box or a PowerMac G3, but at work you have
to take what you're given. Nonetheless, if individual realisations of
the CDF take 2-5 seconds to evaluate, the likelihood function must be
taking multiple minutes to evaluate even once, and would probably take
hours or days with the full data set. I don't have that kind of time
(this job finishes next Tuesday. I have been struggling with this
problem for about two months).
3. The CDF of the multinormal takes even longer than usual to evaluate
if rho is negative. I tried to make up for this by substituting the
actual numerical double-integral of the PDF in those cases (which for
some reason evaluates faster those cases, but slower than the CDF for
positive rho). But this wasn't enough of a speed up and I still don't
get an answer.
Before anyone responds with suggestions to solve the first-order
conditions instead of maximise the likelihood, yes, the first-order
conditions are analytic (see WH Greene "Econometric Analysis" page 661)
but they take even longer to evaluate because they involve PDFs divided
by CDFs and other nonsense.
Chris Farr has previously suggested using a discrete approximation of
the CDF, but I am concerned that the approximation error would get too
large, making my results unreliable.
I notice that Wolfram has recently put a "Mathematica and Statistics"
part on its web site (the pallette is cool, by the way!). But in my
experience, much as I love this package, it's just not ready for
prime-time professional econometrics / statistics. Gauss and Stata can
both do this -- why can't Mathematica? Perhaps it's time to roll some
of the statistical packages into the kernel or otherwise speed them up?
I accept that a general package like Mathematica is going to be slower
than a specialised package like Stata or Shazam -- but the margin needs
to be smaller if Mathematica is to be usable in this context.
This post is probably more in the "rant" category than the usual pleas
for assistance. But if there is anyone out there with any ideas on how
I should proceed from here, I'd be grateful!
Thanks,
Luci
________________________________________________________ Luci Ellis
ph: 61-2-9551-8786
Manager, Research fax:61-2-9551-8024 Payments System
Efficiency em: ellisl at rba.gov.au Payments Policy Department
GPO Box 3947 Reserve Bank of Australia Sydney 2001 Australia