Re: Finding distribution parameters for a mixture of product distributions
- To: mathgroup at smc.vnet.net
- Subject: [mg122344] Re: Finding distribution parameters for a mixture of product distributions
- From: Darren Glosemeyer <darreng at wolfram.com>
- Date: Wed, 26 Oct 2011 17:36:58 -0400 (EDT)
- Delivered-to: l-mathgroup@mail-archive0.wolfram.com
On 10/22/2011 5:06 AM, Koslicki wrote: > I am trying to estimate parameters for a mixture of product > distributions, and Mathematica does not like my inclusion of product > distributions. > > Ignoring the stupidity of the following choice of distributions, > consider the following: > > In[142]:= data=RandomVariate[MixtureDistribution[{.1,.9}, > {BernoulliDistribution[0],BernoulliDistribution[1]}],100]; > > In[140]:= dist1=MixtureDistribution[{p,1-p}, > {BernoulliDistribution[a1],BernoulliDistribution[a2]}]; > > In[141]:= FindDistributionParameters[data,dist1] > > Out[141]= {p->0.261107,a1->0.919502,a2->0.960777} > > So everything works just fine. > > > Yet if I want to look at a *product* of Bernoulli distribution > instead, I get the following: > > In[143]:= data=RandomVariate[MixtureDistribution[{.1,.9}, > {ProductDistribution[BernoulliDistribution[0],BernoulliDistribution[0]],ProductDistribution[BernoulliDistribution[1],BernoulliDistribution[1]]}], > 100]; > > In[144]:= dist1=MixtureDistribution[{p,1-p}, > {ProductDistribution[BernoulliDistribution[a1],BernoulliDistribution[a2]],ProductDistribution[BernoulliDistribution[a3],BernoulliDistribution[a4]]}]; > > In[145]:= FindDistributionParameters[data,dist1] > > During evaluation of In[145]:= FindDistributionParameters::ntsprt: One > or more data points are not in support of the distribution ... > > > How can it be that one of the data points is not in the support of the > distribution when I chose the data points from the distribution > itself? > > This problem seems to arise whenever I am trying to find parameters > for a mixture of product distributions... > > Any thoughts? > > Thanks, > > ~David > The estimation code uses internal utility functions to weed out bad combinations of distributions and data before calling off to optimization routines. That code needs to be improved to handle this particular case, and I will look into that for a future release. It seems to work fine for mixtures of products of betas, so I suspect it may be specifically mixtures of products of discrete distributions that are problematic. I would be interested to see what other examples you've run into trouble with so I can use those as test cases as well. While I do not recommend doing the following in general, a possible way around this for now is to add a definition for the internal validation function so this case will no longer get kicked out as being invalid. In[1]:= data = RandomVariate[MixtureDistribution[{.1, .9}, {ProductDistribution[BernoulliDistribution[0], BernoulliDistribution[0]], ProductDistribution[BernoulliDistribution[1], BernoulliDistribution[1]]}], 100]; In[2]:= dist1 = MixtureDistribution[{p, 1 - p}, {ProductDistribution[BernoulliDistribution[a1], BernoulliDistribution[a2]], ProductDistribution[BernoulliDistribution[a3], BernoulliDistribution[a4]]}]; The case we will add is for mixtures of products of discrete distributions. The check will be weaker than it should be (we will only check that the data points are lists of integers of the right length), so you will need to make sure that all the integers you feed in for these cases are in range. In[3]:= Unprotect[MixtureDistribution]; In[4]:= MixtureDistribution /: Statistics`Library`AcceptableDistributedDataQ[MixtureDistribution[_?VectorQ, distlist : {ProductDistribution[__?Statistics`Library`DiscreteUnivariateDistributionQ] ..}]? DistributionParameterQ, data_?(MatrixQ[#, IntegerQ] &)] := True /; Dimensions[data][[2]] === Length[distlist[[1]]] In[5]:= Protect[MixtureDistribution]; In[6]:= FindDistributionParameters[data, dist1] Out[6]= {p -> 0.909997, a1 -> 1., a2 -> 1., a3 -> 5.31528*10^-8, a4 -> 5.31528*10^-8} Darren Glosemeyer Wolfram Research