MathGroup Archive: September 2012 [00021]

[Date Index] [Thread Index] [Author Index]

Question about statistics, probability or AI: What do you call this

To: mathgroup at smc.vnet.net
Subject: [mg127937] Question about statistics, probability or AI: What do you call this
From: Charles Gillingham <cgillingham1 at mac.com>
Date: Mon, 3 Sep 2012 02:56:50 -0400 (EDT)
Delivered-to: l-mathgroup@mail-archive0.wolfram.com
Delivered-to: l-mathgroup@wolfram.com
Delivered-to: mathgroup-newout@smc.vnet.net
Delivered-to: mathgroup-newsend@smc.vnet.net


Friends:

I am working on several problems related to machine learning and AI, and I would appreciate your expert advice.  All I really need to know is this: what is the proper terminology for problems like this? Where would I find literature and results related to this kind of problem? Or, more generally, what is the name of the sub-field of statistics or AI that studies these problems?

A PROBLEM:

I have a set of random boolean variables E1, E2 =85 En. I also have a set of probabilities for various logical expressions using these variables. I want to choose a complete marginal distribution for these variables: M(e1, e2, =85 en ) (where e1 .. en are "true" or "false").

1) I know the probability of each variable on it's own:
  	P(E1) = pE1 , P(E2) = pE2  =85 P(En) = pEn.

2) I also know these probabilities: 
	P( exactly k out n variables are true, and the other false) = pNk
		(In Mathematica this is: P( BooleanCountingFunction[{k}, n][ E1 =85 En ] ) = pNk)

I have one more constraint on the problem, which considerably complicates things:

3) The distribution I am looking for would respect the independence of the variables as much as possible.  If it is possible to interpret the variables as independent, then the marginal distribution I choose should equal the marginal distribution of n independent variables.

For example, if there are three variables, I would also have 8 conditional constraints of the form:

3') If  P( two of A,B,C are true ) =  P(A)P(B)(1 - P(C)) + P(A)(1-P(B))P(C) + (1 - P(A))P(B) P(C)), then M(True, True, False) = P(A)P(B)(1- P(C))

Note that the basic definitions of marginal probability, etc give us all these obvious facts:
	0 <= pEi <= 1 for all i,  0 <= pNk <= 1 for all k
	pEi  = Sum[ M( e1 =85 e(i-1), True, e(i+1) =85 en ),  {all boolean possibilities for {e1 =85 e(i-1), e(i+1) =85 en } ]
	pNk = Sum[ M( e1 .. en ), {all combinations of {True, False} where there are k True values and (n-k) False values} ]
=09
And the slightly less obvious:
	Sum[ k pNk, { k, 0, n }]  = Sum[ Ei, { i, 1, n } ]
	Sum[ pNk, { k, 0, n} ] = 1

There are 2 n + 2 independent constraints above (I think), and there are 2^n values in the marginal distribution, so I have 2^n - 2 n - 2 degrees of freedom, so obviously there are many possible solutions for a large number of variables.

ONE MORE NOTE:
The problem is easier to pose in this form, but I would actually prefer to have the solution in the form of a conditional probability:
	P(ei |  e(i-1),  e(i-2), =85 e1 )
because this would scale better.

THANKS,

Again, I'm not asking you all to solve this. I actually have to solve a whole family of similar problems, and so I am mostly interested in finding tools & terminology that I can study. Thanks again for any advice you can give me.

Charles Gillingham

Prev by Date: Re: split the sublists into parts according to some rules

Next by Date: Re: Creating animated 3d graphic in Mathematica for export

Previous by thread: Re: A new FrontEnd

Next by thread: Re: Creating animated 3d graphic in Mathematica for export