MathGroup Archive: February 2006 [00452]

[Date Index] [Thread Index] [Author Index]

Re: Statistical Analysis & Pattern Matching

To: mathgroup at smc.vnet.net
Subject: [mg64541] Re: Statistical Analysis & Pattern Matching
From: Paul Abbott <paul at physics.uwa.edu.au>
Date: Mon, 20 Feb 2006 22:31:26 -0500 (EST)
Organization: The University of Western Australia
References: <dtc9rl$a5m$1@smc.vnet.net>
Sender: owner-wri-mathgroup at wolfram.com

In article <dtc9rl$a5m$1 at smc.vnet.net>, virtualadepts at gmail.com wrote:

> If I have a set of random data, which could be the result of rolling a
> 6 sided die 1,000 times, and the die is favored to rolls 2 numbers more
> often than the others, how do I analyze the data to determine which
> numbers it favors without knowing in advance that it favors any of the
> numbers?
> 
> Considering I am looking at random data it is impossible to say if the
> dice favors any number for sure, but I can assume that it favors a
> number and check to see which numbers it would favor if it did.
>
> That is an example of the type of problem I want to solve but I can
> think of others.  How about an algorithm that generates random numbers
> between 1 and 1,000,000.  Lets say I have a database of 10 million
> numbers it has generated, and want to determine what numbers it favors.
>
> This is not the same question as asking if it is random data, because 
> for our purposes it is random. This is just asking if it is more 
> likely to produce certain number. 
> Lets say for this example that the machine is programmed to never 
> produce the same number twice, until it has randomly generated every 
> other possible number. Is there a way to predict this is happening by 
> looking at the data? 

Essentially, you should read up on Maximum Entropy. As an example 
application, Michael Kelly <http://www.stuart.iit.edu/faculty/kelly>, a 
keen Mathematica user, uses Maximum Entropy and Linear Inversion to 
evaluate asset distributions.

I see that you posted this message to several other newsgroups. Martin 
Brown's posting on sci.math.num-analysis was particularly relevant:

| Look for Wolf's dice data and Ed Jaynes Bayesian analysis of the 
| biases in them. Interesting stuff considering the dies Wolf used were 
| the best quality manufacture of their day and he made ISTR 100,000 
| throws. 

A better link to Rau's paper is http://arxiv.org/pdf/physics/9805024.

To prescribe the relative frequencies of a set of numbers, use the 
cumulative frequencies,

  cumfreq[x_List] := FoldList[Plus, 0, x]/Tr[x]

to produce an inverse cumulative distribution interpolating function.

  icdf[x_List] := icdf[x] = Interpolation[Transpose[
    {cumfreq[x], Range[0, Length[x]]}], InterpolationOrder -> 0]

For example,

  Plot[icdf[{1, 1, 3, 2, 1, 1}][x], {x, 0, 1}]

After loading the statistics stub

  << Statistics`

here are the sample frequencies of a loaded die with 3 and 4 favoured.

  Table[icdf[{1, 1, 3, 2, 1, 1}][Random[]], {1000}]; 

  Frequencies[%]

Cheers,
Paul

_______________________________________________________________________
Paul Abbott                                      Phone:  61 8 6488 2734
School of Physics, M013                            Fax: +61 8 6488 1014
The University of Western Australia         (CRICOS Provider No 00126G)    
AUSTRALIA                               http://physics.uwa.edu.au/~paul

Prev by Date: Re: Re: Counting circles (digital image processing)

Next by Date: Re: Statistical Analysis & Pattern Matching

Previous by thread: Re: Statistical Analysis & Pattern Matching

Next by thread: Re: Statistical Analysis & Pattern Matching