Re: Statistical Analysis & Pattern Matching

• To: mathgroup at smc.vnet.net
• Subject: [mg64541] Re: Statistical Analysis & Pattern Matching
• From: Paul Abbott <paul at physics.uwa.edu.au>
• Date: Mon, 20 Feb 2006 22:31:26 -0500 (EST)
• Organization: The University of Western Australia
• References: <dtc9rl\$a5m\$1@smc.vnet.net>
• Sender: owner-wri-mathgroup at wolfram.com

```In article <dtc9rl\$a5m\$1 at smc.vnet.net>, virtualadepts at gmail.com wrote:

> If I have a set of random data, which could be the result of rolling a
> 6 sided die 1,000 times, and the die is favored to rolls 2 numbers more
> often than the others, how do I analyze the data to determine which
> numbers it favors without knowing in advance that it favors any of the
> numbers?
>
> Considering I am looking at random data it is impossible to say if the
> dice favors any number for sure, but I can assume that it favors a
> number and check to see which numbers it would favor if it did.
>
> That is an example of the type of problem I want to solve but I can
> think of others.  How about an algorithm that generates random numbers
> between 1 and 1,000,000.  Lets say I have a database of 10 million
> numbers it has generated, and want to determine what numbers it favors.
>
> This is not the same question as asking if it is random data, because
> for our purposes it is random. This is just asking if it is more
> likely to produce certain number.
> Lets say for this example that the machine is programmed to never
> produce the same number twice, until it has randomly generated every
> other possible number. Is there a way to predict this is happening by
> looking at the data?

Essentially, you should read up on Maximum Entropy. As an example
application, Michael Kelly <http://www.stuart.iit.edu/faculty/kelly>, a
keen Mathematica user, uses Maximum Entropy and Linear Inversion to
evaluate asset distributions.

I see that you posted this message to several other newsgroups. Martin
Brown's posting on sci.math.num-analysis was particularly relevant:

| Look for Wolf's dice data and Ed Jaynes Bayesian analysis of the
| biases in them. Interesting stuff considering the dies Wolf used were
| the best quality manufacture of their day and he made ISTR 100,000
| throws.

A better link to Rau's paper is http://arxiv.org/pdf/physics/9805024.

To prescribe the relative frequencies of a set of numbers, use the
cumulative frequencies,

cumfreq[x_List] := FoldList[Plus, 0, x]/Tr[x]

to produce an inverse cumulative distribution interpolating function.

icdf[x_List] := icdf[x] = Interpolation[Transpose[
{cumfreq[x], Range[0, Length[x]]}], InterpolationOrder -> 0]

For example,

Plot[icdf[{1, 1, 3, 2, 1, 1}][x], {x, 0, 1}]

<< Statistics`

here are the sample frequencies of a loaded die with 3 and 4 favoured.

Table[icdf[{1, 1, 3, 2, 1, 1}][Random[]], {1000}];

Frequencies[%]

Cheers,
Paul

_______________________________________________________________________
Paul Abbott                                      Phone:  61 8 6488 2734
School of Physics, M013                            Fax: +61 8 6488 1014
The University of Western Australia         (CRICOS Provider No 00126G)
AUSTRALIA                               http://physics.uwa.edu.au/~paul

```

• Prev by Date: Re: Re: Counting circles (digital image processing)
• Next by Date: Re: Statistical Analysis & Pattern Matching
• Previous by thread: Re: Statistical Analysis & Pattern Matching
• Next by thread: Re: Statistical Analysis & Pattern Matching