       Re: Finding distribution parameters for a mixture of product distributions

• To: mathgroup at smc.vnet.net
• Subject: [mg122344] Re: Finding distribution parameters for a mixture of product distributions
• From: Darren Glosemeyer <darreng at wolfram.com>
• Date: Wed, 26 Oct 2011 17:36:58 -0400 (EDT)
• Delivered-to: l-mathgroup@mail-archive0.wolfram.com

```On 10/22/2011 5:06 AM, Koslicki wrote:
> I am trying to estimate parameters for a mixture of product
> distributions, and Mathematica does not like my inclusion of product
> distributions.
>
> Ignoring the stupidity of the following choice of distributions,
> consider the following:
>
> In:= data=RandomVariate[MixtureDistribution[{.1,.9},
> {BernoulliDistribution,BernoulliDistribution}],100];
>
> In:= dist1=MixtureDistribution[{p,1-p},
> {BernoulliDistribution[a1],BernoulliDistribution[a2]}];
>
> In:= FindDistributionParameters[data,dist1]
>
> Out= {p->0.261107,a1->0.919502,a2->0.960777}
>
> So everything works just fine.
>
>
> Yet if I want to look at a *product* of Bernoulli distribution
> instead, I get the following:
>
> In:= data=RandomVariate[MixtureDistribution[{.1,.9},
> {ProductDistribution[BernoulliDistribution,BernoulliDistribution],ProductDistribution[BernoulliDistribution,BernoulliDistribution]}],
> 100];
>
> In:= dist1=MixtureDistribution[{p,1-p},
> {ProductDistribution[BernoulliDistribution[a1],BernoulliDistribution[a2]],ProductDistribution[BernoulliDistribution[a3],BernoulliDistribution[a4]]}];
>
> In:= FindDistributionParameters[data,dist1]
>
> During evaluation of In:= FindDistributionParameters::ntsprt: One
> or more data points are not in support of the distribution ...
>
>
> How can it be that one of the data points is not in the support of the
> distribution when I chose the data points from the distribution
> itself?
>
> This problem seems to arise whenever I am trying to find parameters
> for a mixture of product distributions...
>
> Any thoughts?
>
> Thanks,
>
> ~David
>

The estimation code uses internal utility functions to weed out bad
combinations of distributions and data before calling off to
optimization routines. That code needs to be improved to handle this
particular case, and I will look into that for a future release. It
seems to work fine for mixtures of products of betas, so I suspect it
may be specifically mixtures of products of discrete distributions that
are problematic. I would be interested to see what other examples you've
run into trouble with so I can use those as test cases as well.

While I do not recommend doing the following in general, a possible way
around this for now is to add a definition for the internal validation
function so this case will no longer get kicked out as being invalid.

In:= data = RandomVariate[MixtureDistribution[{.1, .9},
{ProductDistribution[BernoulliDistribution, BernoulliDistribution],
ProductDistribution[BernoulliDistribution,
BernoulliDistribution]}], 100];

In:= dist1 = MixtureDistribution[{p, 1 - p},
{ProductDistribution[BernoulliDistribution[a1], BernoulliDistribution[a2]],
ProductDistribution[BernoulliDistribution[a3],
BernoulliDistribution[a4]]}];

The case we will add is for mixtures of products of discrete
distributions. The check will be weaker than it should be (we will only
check that the data points are lists of integers of the right length),
so you will need to make sure that all the integers you feed in for
these cases are in range.

In:= Unprotect[MixtureDistribution];

In:= MixtureDistribution /:
Statistics`Library`AcceptableDistributedDataQ[MixtureDistribution[_?VectorQ,

distlist :
{ProductDistribution[__?Statistics`Library`DiscreteUnivariateDistributionQ]
..}]?
DistributionParameterQ, data_?(MatrixQ[#, IntegerQ] &)] := True /;
Dimensions[data][] === Length[distlist[]]

In:= Protect[MixtureDistribution];

In:= FindDistributionParameters[data, dist1]

Out= {p -> 0.909997, a1 -> 1., a2 -> 1., a3 -> 5.31528*10^-8, a4 ->
5.31528*10^-8}

Darren Glosemeyer
Wolfram Research

```

• Prev by Date: Re: Integral points on elliptic curves
• Next by Date: Re: bug ?
• Previous by thread: Finding distribution parameters for a mixture of product distributions
• Next by thread: integration problem