Re: R: I: Re: Kolmogorov Smirnov in two or more dimensions is in Mathematica 8.0.4

• To: mathgroup at smc.vnet.net
• Subject: [mg124993] Re: R: I: Re: Kolmogorov Smirnov in two or more dimensions is in Mathematica 8.0.4
• From: Andy Ross <andyr at wolfram.com>
• Date: Thu, 16 Feb 2012 03:29:31 -0500 (EST)
• Delivered-to: l-mathgroup@mail-archive0.wolfram.com

```I don't believe it is fixed in 8.0.4.  I discovered the issue too late
for the change to make it in.

I can't stress enough that all of the tests based on the empirical
distribution function are marginal tests. A marginal-based test can only
be used to reject the null hypothesis of goodness of fit.  It does not
provide adequate evidence for a good fit to the joint distribution
because dependency structure is missed. This is true whether you are
comparing data to a distribution or two data sets.

Here is a function that will compute marginal-based test statistics and
p-values more quickly than the Monte-Carlo based ones that are in
8.0.4.  I suggest you use this if such tests are adequate for your purposes.

marginalMVTest[data1_, data2_, test_] :=
Block[{t, p, dim = Length[data1[[1]]]},
{t, p} =
Transpose[
Table[DistributionFitTest[data1[[All, i]], data2[[All, i]],
"TestData"], {i, dim}]];
{"Test" -> test, "T" -> Mean[t], "P-Value" ->
CDF[UniformSumDistribution[dim], Total@p]
]

As a first example take some data drawn from two very different
distributions.

data1 = data2 = RandomVariate[BinormalDistribution[.9], 100];
data2 = RandomVariate[BinormalDistribution[-.9], 100];

In[47]:= marginalMVTest[data1, data2, "KolmogorovSmirnov"]

Out[47]= {"Test" -> "KolmogorovSmirnov", "T" -> 0.0421055,  "P-Value" ->
0.917708}

Notice we would claim that we cannot reject the null hypothesis that
data1 and data2 were drawn from the same distribution. However, this is
obviously an error.  The reason this happens is because marginally, both
distributions are NormalDistribution[0,1]. However, if distributions
differ marginally they also differ jointly.  Thus the following does
what we want.

data1 = RandomVariate[ProductDistribution[NormalDistribution[],
NormalDistribution[2, 1]], 100];
data2 = RandomVariate[BinormalDistribution[0], 100];

In[79]:= marginalMVTest[data1, data2, "KolmogorovSmirnov"]

Out[79]= {"Test" -> "KolmogorovSmirnov", "T" -> 0.973367, "P-Value" ->
0.00918411}

As I said in the previous response, the Szekely-Energy test in
DistributionFitTest will be a better comparison. You will just need to
wait a while for it to compute.

If you have two matrices data1 and data2 this can be accomplished via

DistributionFitTest[data1, data2,
{"TestDataTable","SzekelyEnergy"},Method->{"MonteCarlo","MonteCarloSamples"->1000}]

I've shown this with sub-option "MonteCarloSamples".  You can increase
this number to improve the estimate of the p-value.  I warn you though,

Hopefully some of this is useful to you. Sorry for any inconvenience
this might have caused you.

-Andy

On 2/15/2012 1:13 PM, maria giovanna dainotti wrote:
> Dear Andy Ross,
> I have just got information of a new release Mathematica 8.0.4.
> Do you know if the bug of Kolmogorv Smirnov in two dimension is fixed
> in that version?
> Unfortunately, for my purpose it is better to use a statistical tools
> that make comparison directly with the data not assuming any stastistic.
>
> I would be very grateful if you could let me know
> Best regards,
> Maria
>
>
> --- *Mer 8/2/12, maria giovanna dainotti
> /<mariagiovannadainotti at yahoo.it>/* ha scritto:
>
>
>     Da: maria giovanna dainotti <mariagiovannadainotti at yahoo.it>
>     Oggetto: I: Re: Kolmogorov Smirnov in two or more
>     dimensions
>     A:
>     Data: Mercoled=EC 8 febbraio 2012, 20:58
>
>
>
>     --- *Mer 8/2/12, Andy Ross /<andyr at wolfram.com>/* ha scritto:
>
>
>         Da: Andy Ross <andyr at wolfram.com>
>         Oggetto: Re: Kolmogorov Smirnov in two or more
>         dimensions
>         A: "maria giovanna dainotti" <mariagiovannadainotti at yahoo.it>
>         Cc: mathgroup at smc.vnet.net
>         Data: Mercoled=EC 8 febbraio 2012, 17:09
>
>         This is a bug in KolmogorovSmirnovTest that will be fixed in
>         later versions of Mathematica. For the time being you can use
>         CramerVonMisesTest or DistributionFitTest.
>
>         Note that DistributionFitTest uses a little known test
>         referred to as Szekely-Energy. A description can be found in...
>
>         Szekely, G. J. and Rizzo, M. L. (2004) Testing for Equal
>         Distributions in High Dimension, InterStat,
>         November (5)
>
>         P-values for this test are computed via Monte-Carlo simulation
>         at the moment so it tends to be slow.
>
>         things marginally.  All of the other tests currently available
>         perform tests on the marginal data and then aggregate the
>         statistics. Thus any differences in covariance structure will
>         be missed.
>
>         The take home message is that if things don't fit marginally
>         then they don't fit jointly but if they do seem to fit well
>         marginally you still need to dig deeper to determine whether
>         the joint distributions are equivalent.
>
>         Andy Ross
>         Wolfram Research
>
>         On 2/8/2012 4:32 AM, maria giovanna dainotti wrote:
>         > Dear Mathematica group,
>         > I am doing an analysis of the Kolmogorov with two data sets
>         of 2 dimensions each.
>         > When I apply the KolmogorovSmirnovTest[Data0,Data0] I should
>         get 1. I did just a trial and I got 0.
>         > I am copying the datafile example for clarity.
>         > {{0.97304, 14.1829}, {0.98663, 14.1295}, {0.98284, 14.3172},
>         {0.81423,
>         >     14.3466}, {0.97303, 14.5966}, {0.87122, 14.8435}, {0.90252,
>         >    14.9036}, {0.81887, 15.1177}, {1.07722, 14.6849}, {0.86684,
>         >    15.6456}, {0.86664, 14.7034}, {0.78728, 15.0898}, {1.10336,
>         >    14.4085}, {1.12014, 14.7281}, {0.95923, 14.4988}, {0.89942,
>         >    15.3173}, {0.83841, 14.8422}, {0.99105, 14.8813}, {1.111,
>         >    14.5964}, {0.93255, 15.5019}, {1.03142, 14.8009}, {1.00661,
>         >    15.0827}, {0.93255, 15.3064}, {1.10023, 14.7189}, {0.8797,
>         >    15.1038}, {1.0013, 14.6755}, {0.87673, 15.0952}, {0.84131,
>         >    15.7345}, {1.06392, 15.3528}, {1.00138, 14.9835}, {0.77803,
>         >    15.4637}, {0.76795, 15.3611}, {0.98328, 15.1047}, {0.89193,
>         >    14.8321}, {0.72882, 15.909}, {0.77123, 15.7902}, {0.86218,
>         >    15.6637}, {0.84381, 15.536}, {0.99263, 15.8903}, {1.0805,
>         >    15.1453}, {0.85316, 15.7793}, {0.85186, 15.9119}, {1.10898,
>         >    15.7583}, {1.03365, 15.7393}, {0.84783, 16.1911}, {1.10979,
>         >    16.0031}, {1.05238, 16.015}, {0.90259, 16.4864}, {0.84963,
>         >    15.9818}, {1.09221, 16.0088}, {1.0443, 15.8326}, {0.8945,
>         >    16.1927}, {0.83015, 16.2776}, {0.7551, 16.5538}, {1.05947,
>         >    15.9138}, {1.06189, 15.6061}, {1.05889, 16.0743}, {0.85216,
>         >    16.1568}, {0.72597, 16.7657}, {0.93638, 16.3583}, {0.81968,
>         >    16.2711}, {0.95022, 16.3549}, {1.04536, 16.18}, {1.0786,
>         >    16.0967}, {0.9385, 15.8504}, {0.95024, 15.9338}, {0.76753,
>         >    17.1461}, {1.18224, 16.0437}, {0.96447, 16.4908}, {0.98235,
>         >    16.0892}, {1.06151, 16.699}, {0.79052, 16.5207}, {1.15863,
>         >    16.3223}, {1.00795, 16.2444}, {1.07284, 16.6536}, {1.04796,
>         >    16.865}, {0.84226, 16.7247}, {1.04712, 16.2673}}
>         >
>         > Or maybe should I use some other conditions?
>         > Or if it doesn't work is there an already built package that
>         I can use?
>         >
>         > Thanks a lot for your help
>         >
>         > Best regards,
>         > Maria
>

```

• Prev by Date: R: I: Re: Kolmogorov Smirnov in two or more dimensions is in Mathematica 8.0.4
• Next by Date: extract only positive solutions
• Previous by thread: R: I: Re: Kolmogorov Smirnov in two or more dimensions is in Mathematica 8.0.4
• Next by thread: extract only positive solutions