Re: Statistical Analysis & Pattern Matching

*To*: mathgroup at smc.vnet.net*Subject*: [mg64541] Re: Statistical Analysis & Pattern Matching*From*: Paul Abbott <paul at physics.uwa.edu.au>*Date*: Mon, 20 Feb 2006 22:31:26 -0500 (EST)*Organization*: The University of Western Australia*References*: <dtc9rl$a5m$1@smc.vnet.net>*Sender*: owner-wri-mathgroup at wolfram.com

In article <dtc9rl$a5m$1 at smc.vnet.net>, virtualadepts at gmail.com wrote: > If I have a set of random data, which could be the result of rolling a > 6 sided die 1,000 times, and the die is favored to rolls 2 numbers more > often than the others, how do I analyze the data to determine which > numbers it favors without knowing in advance that it favors any of the > numbers? > > Considering I am looking at random data it is impossible to say if the > dice favors any number for sure, but I can assume that it favors a > number and check to see which numbers it would favor if it did. > > That is an example of the type of problem I want to solve but I can > think of others. How about an algorithm that generates random numbers > between 1 and 1,000,000. Lets say I have a database of 10 million > numbers it has generated, and want to determine what numbers it favors. > > This is not the same question as asking if it is random data, because > for our purposes it is random. This is just asking if it is more > likely to produce certain number. > Lets say for this example that the machine is programmed to never > produce the same number twice, until it has randomly generated every > other possible number. Is there a way to predict this is happening by > looking at the data? Essentially, you should read up on Maximum Entropy. As an example application, Michael Kelly <http://www.stuart.iit.edu/faculty/kelly>, a keen Mathematica user, uses Maximum Entropy and Linear Inversion to evaluate asset distributions. I see that you posted this message to several other newsgroups. Martin Brown's posting on sci.math.num-analysis was particularly relevant: | Look for Wolf's dice data and Ed Jaynes Bayesian analysis of the | biases in them. Interesting stuff considering the dies Wolf used were | the best quality manufacture of their day and he made ISTR 100,000 | throws. A better link to Rau's paper is http://arxiv.org/pdf/physics/9805024. To prescribe the relative frequencies of a set of numbers, use the cumulative frequencies, cumfreq[x_List] := FoldList[Plus, 0, x]/Tr[x] to produce an inverse cumulative distribution interpolating function. icdf[x_List] := icdf[x] = Interpolation[Transpose[ {cumfreq[x], Range[0, Length[x]]}], InterpolationOrder -> 0] For example, Plot[icdf[{1, 1, 3, 2, 1, 1}][x], {x, 0, 1}] After loading the statistics stub << Statistics` here are the sample frequencies of a loaded die with 3 and 4 favoured. Table[icdf[{1, 1, 3, 2, 1, 1}][Random[]], {1000}]; Frequencies[%] Cheers, Paul _______________________________________________________________________ Paul Abbott Phone: 61 8 6488 2734 School of Physics, M013 Fax: +61 8 6488 1014 The University of Western Australia (CRICOS Provider No 00126G) AUSTRALIA http://physics.uwa.edu.au/~paul