Question about statistics, probability or AI: What do you call this
- To: mathgroup at smc.vnet.net
- Subject: [mg127937] Question about statistics, probability or AI: What do you call this
- From: Charles Gillingham <cgillingham1 at mac.com>
- Date: Mon, 3 Sep 2012 02:56:50 -0400 (EDT)
- Delivered-to: l-mathgroup@mail-archive0.wolfram.com
- Delivered-to: l-mathgroup@wolfram.com
- Delivered-to: mathgroup-newout@smc.vnet.net
- Delivered-to: mathgroup-newsend@smc.vnet.net
Friends: I am working on several problems related to machine learning and AI, and I would appreciate your expert advice. All I really need to know is this: what is the proper terminology for problems like this? Where would I find literature and results related to this kind of problem? Or, more generally, what is the name of the sub-field of statistics or AI that studies these problems? A PROBLEM: I have a set of random boolean variables E1, E2 =85 En. I also have a set of probabilities for various logical expressions using these variables. I want to choose a complete marginal distribution for these variables: M(e1, e2, =85 en ) (where e1 .. en are "true" or "false"). 1) I know the probability of each variable on it's own: P(E1) = pE1 , P(E2) = pE2 =85 P(En) = pEn. 2) I also know these probabilities: P( exactly k out n variables are true, and the other false) = pNk (In Mathematica this is: P( BooleanCountingFunction[{k}, n][ E1 =85 En ] ) = pNk) I have one more constraint on the problem, which considerably complicates things: 3) The distribution I am looking for would respect the independence of the variables as much as possible. If it is possible to interpret the variables as independent, then the marginal distribution I choose should equal the marginal distribution of n independent variables. For example, if there are three variables, I would also have 8 conditional constraints of the form: 3') If P( two of A,B,C are true ) = P(A)P(B)(1 - P(C)) + P(A)(1-P(B))P(C) + (1 - P(A))P(B) P(C)), then M(True, True, False) = P(A)P(B)(1- P(C)) Note that the basic definitions of marginal probability, etc give us all these obvious facts: 0 <= pEi <= 1 for all i, 0 <= pNk <= 1 for all k pEi = Sum[ M( e1 =85 e(i-1), True, e(i+1) =85 en ), {all boolean possibilities for {e1 =85 e(i-1), e(i+1) =85 en } ] pNk = Sum[ M( e1 .. en ), {all combinations of {True, False} where there are k True values and (n-k) False values} ] =09 And the slightly less obvious: Sum[ k pNk, { k, 0, n }] = Sum[ Ei, { i, 1, n } ] Sum[ pNk, { k, 0, n} ] = 1 There are 2 n + 2 independent constraints above (I think), and there are 2^n values in the marginal distribution, so I have 2^n - 2 n - 2 degrees of freedom, so obviously there are many possible solutions for a large number of variables. ONE MORE NOTE: The problem is easier to pose in this form, but I would actually prefer to have the solution in the form of a conditional probability: P(ei | e(i-1), e(i-2), =85 e1 ) because this would scale better. THANKS, Again, I'm not asking you all to solve this. I actually have to solve a whole family of similar problems, and so I am mostly interested in finding tools & terminology that I can study. Thanks again for any advice you can give me. Charles Gillingham