Question about statistics, probability or AI: What do you call this
- To: mathgroup at smc.vnet.net
- Subject: [mg127937] Question about statistics, probability or AI: What do you call this
- From: Charles Gillingham <cgillingham1 at mac.com>
- Date: Mon, 3 Sep 2012 02:56:50 -0400 (EDT)
- Delivered-to: l-mathgroup@mail-archive0.wolfram.com
- Delivered-to: l-mathgroup@wolfram.com
- Delivered-to: mathgroup-newout@smc.vnet.net
- Delivered-to: mathgroup-newsend@smc.vnet.net
Friends:
I am working on several problems related to machine learning and AI, and I would appreciate your expert advice. All I really need to know is this: what is the proper terminology for problems like this? Where would I find literature and results related to this kind of problem? Or, more generally, what is the name of the sub-field of statistics or AI that studies these problems?
A PROBLEM:
I have a set of random boolean variables E1, E2 =85 En. I also have a set of probabilities for various logical expressions using these variables. I want to choose a complete marginal distribution for these variables: M(e1, e2, =85 en ) (where e1 .. en are "true" or "false").
1) I know the probability of each variable on it's own:
P(E1) = pE1 , P(E2) = pE2 =85 P(En) = pEn.
2) I also know these probabilities:
P( exactly k out n variables are true, and the other false) = pNk
(In Mathematica this is: P( BooleanCountingFunction[{k}, n][ E1 =85 En ] ) = pNk)
I have one more constraint on the problem, which considerably complicates things:
3) The distribution I am looking for would respect the independence of the variables as much as possible. If it is possible to interpret the variables as independent, then the marginal distribution I choose should equal the marginal distribution of n independent variables.
For example, if there are three variables, I would also have 8 conditional constraints of the form:
3') If P( two of A,B,C are true ) = P(A)P(B)(1 - P(C)) + P(A)(1-P(B))P(C) + (1 - P(A))P(B) P(C)), then M(True, True, False) = P(A)P(B)(1- P(C))
Note that the basic definitions of marginal probability, etc give us all these obvious facts:
0 <= pEi <= 1 for all i, 0 <= pNk <= 1 for all k
pEi = Sum[ M( e1 =85 e(i-1), True, e(i+1) =85 en ), {all boolean possibilities for {e1 =85 e(i-1), e(i+1) =85 en } ]
pNk = Sum[ M( e1 .. en ), {all combinations of {True, False} where there are k True values and (n-k) False values} ]
=09
And the slightly less obvious:
Sum[ k pNk, { k, 0, n }] = Sum[ Ei, { i, 1, n } ]
Sum[ pNk, { k, 0, n} ] = 1
There are 2 n + 2 independent constraints above (I think), and there are 2^n values in the marginal distribution, so I have 2^n - 2 n - 2 degrees of freedom, so obviously there are many possible solutions for a large number of variables.
ONE MORE NOTE:
The problem is easier to pose in this form, but I would actually prefer to have the solution in the form of a conditional probability:
P(ei | e(i-1), e(i-2), =85 e1 )
because this would scale better.
THANKS,
Again, I'm not asking you all to solve this. I actually have to solve a whole family of similar problems, and so I am mostly interested in finding tools & terminology that I can study. Thanks again for any advice you can give me.
Charles Gillingham