Re: Kolmogorov Smirnov in two or more dimensions is in Mathematica 8.0.4
- To: mathgroup at smc.vnet.net
- Subject: [mg125111] Re: Kolmogorov Smirnov in two or more dimensions is in Mathematica 8.0.4
- From: maria giovanna dainotti <mariagiovannadainotti at yahoo.it>
- Date: Wed, 22 Feb 2012 05:29:46 -0500 (EST)
- Delivered-to: l-mathgroup@mail-archive0.wolfram.com
Dear Andy Ross,
thanks for your suggestion. I applied the method
DistributionFitTest[data1, data2,
{"TestDataTable","SzekelyEnergy"},Method->{"MonteCarlo","MonteCarloSamples"->1000}]
and it works, but when I try to minimize the function I have a problem.
For[j = 1, j < n + 1,
fn = Function[{s, m},
x = Table[{Data1[[j, 1]] + s, Data1[[j, 2]] + m}, {j, 1, size}];
-2*Log[
DistributionFitTest[x, Data0, {"TestDataTable", "SzekelyEnergy"},
Method -> {"MonteCarlo", "MonteCarloSamples" -> 1000}][[1, 1,
2, 3]]]
]]
With Data1
{{0.894606, 14.0865}, {0.811846, 13.7631}, {1.00287,
14.2577}, {0.798426, 13.8605}, {0.765556, 14.0297}, {1.06602,
14.144}, {0.965456, 14.2174}, {0.879326, 14.6311}, {0.831876,
14.006}, {0.695166, 14.2763}, {1.05657, 14.2558}, {1.05519,
13.7674}, {0.811626, 14.1137}, {0.910116, 13.9954}, {0.933376,
13.9232}, {0.870476, 14.2965}, {0.713326, 14.7321}, {1.13929,
13.9139}, {1.1878, 14.0376}, {0.998686, 13.9364}, {1.10588,
14.0096}, {1.06213, 14.2319}, {0.797846, 14.9199}, {0.768916,
14.5909}, {0.944476, 14.8957}, {1.09493, 14.6853}, {1.03171,
14.5485}, {0.838416, 15.1119}, {1.03875, 14.3372}, {0.695036,
15.0422}, {0.860336, 14.9023}, {0.903736, 15.1527}, {0.798196,
14.7923}, {0.695146, 15.5481}, {1.08506, 15.2012}, {1.07127,
15.2615}, {0.919576, 15.5636}, {1.01526, 15.4415}, {1.15206,
15.0784}, {0.844826, 15.6197}, {0.897866, 15.887}, {0.762816,
15.7318}, {0.867886, 15.743}, {0.991576, 15.7403}, {0.898346,
15.6946}, {0.996826, 15.6467}, {1.0194, 15.6681}, {0.830686,
15.5817}, {0.982806, 15.6863}, {0.883306, 16.3492}, {0.928786,
15.9975}, {0.932786, 15.9091}, {1.08748, 15.8808}, {0.963906,
16.4309}, {0.896236, 16.7413}, {0.750366, 16.7619}, {0.945936,
16.6388}, {1.064, 16.7059}, {0.947046, 17.0593}, {1.0258,
16.3197}, {0.973976, 16.9003}, {0.829206, 17.183}, {0.917956,
17.0182}, {0.954016, 16.8625}, {0.842906, 17.2803}, {1.10261,
17.0396}, {1.02155, 17.3819}, {0.671206, 17.5726}, {1.04596,
17.076}, {1.08985, 17.3047}, {1.05809, 17.2071}, {0.916836,
17.6237}, {0.815676, 17.5559}, {0.943866, 18.1943}, {1.02348,
16.8333}, {1.00346, 16.9493}, {1.15636, 17.0909}, {1.11935,
17.5003}}
and Data0
{{0.97304, 14.1829}, {0.98663, 14.1295}, {0.98284, 14.3172}, {0.81423,
14.3466}, {0.97303, 14.5966}, {0.87122, 14.8435}, {0.90252,
14.9036}, {0.81887, 15.1177}, {1.07722, 14.6849}, {0.86684,
15.6456}, {0.86664, 14.7034}, {0.78728, 15.0898}, {1.10336,
14.4085}, {1.12014, 14.7281}, {0.95923, 14.4988}, {0.89942,
15.3173}, {0.83841, 14.8422}, {0.99105, 14.8813}, {1.111,
14.5964}, {0.93255, 15.5019}, {1.03142, 14.8009}, {1.00661,
15.0827}, {0.93255, 15.3064}, {1.10023, 14.7189}, {0.8797,
15.1038}, {1.0013, 14.6755}, {0.87673, 15.0952}, {0.84131,
15.7345}, {1.06392, 15.3528}, {1.00138, 14.9835}, {0.77803,
15.4637}, {0.76795, 15.3611}, {0.98328, 15.1047}, {0.89193,
14.8321}, {0.72882, 15.909}, {0.77123, 15.7902}, {0.86218,
15.6637}, {0.84381, 15.536}, {0.99263, 15.8903}, {1.0805,
15.1453}, {0.85316, 15.7793}, {0.85186, 15.9119}, {1.10898,
15.7583}, {1.03365, 15.7393}, {0.84783, 16.1911}, {1.10979,
16.0031}, {1.05238, 16.015}, {0.90259, 16.4864}, {0.84963,
15.9818}, {1.09221, 16.0088}, {1.0443, 15.8326}, {0.8945,
16.1927}, {0.83015, 16.2776}, {0.7551, 16.5538}, {1.05947,
15.9138}, {1.06189, 15.6061}, {1.05889, 16.0743}, {0.85216,
16.1568}, {0.72597, 16.7657}, {0.93638, 16.3583}, {0.81968,
16.2711}, {0.95022, 16.3549}, {1.04536, 16.18}, {1.0786,
16.0967}, {0.9385, 15.8504}, {0.95024, 15.9338}, {0.76753,
17.1461}, {1.18224, 16.0437}, {0.96447, 16.4908}, {0.98235,
16.0892}, {1.06151, 16.699}, {0.79052, 16.5207}, {1.15863,
16.3223}, {1.00795, 16.2444}, {1.07284, 16.6536}, {1.04796,
16.865}, {0.84226, 16.7247}, {1.04712, 16.2673}}
Namely it fails when I ask
bestsolNM = FindMinimum[{Function[{s, m},
x = Table[{Data1[[j, 1]] + s, Data1[[j, 2]] + m}, {j, 1, size}];
-2*Log[
DistributionFitTest[x,
Data0, {"TestDataTable", "SzekelyEnergy"},
Method -> {"MonteCarlo", "MonteCarloSamples" -> 1000}][[1, 1,
2, 3]]]
], -0.2 <= s <= 0.04, -0.2 <= m <= 0.11}, {s, 0.04}, {m, 0.02},
MaxIterations -> 250, Method -> Automatic]
I did also this test in one dimension and it seems less stable than the KS.
Is there a way to make this test more stable?
Regarding the other test you suggested marginalMVTest should I use the
ProductDistribution for my data and a normal distribution?
Did I understand properly?
Thanks a lot,
Regards,
Maria
--- Mer 15/2/12, Andy Ross <andyr at wolfram.com> ha scritto:
Da: Andy Ross <andyr at wolfram.com>
Oggetto: Re: R: I: Re: Kolmogorov Smirnov in two or more dimensions is in Mathematica 8.0.4
A: "maria giovanna dainotti" <mariagiovannadainotti at yahoo.it>
Cc: "mathematica group" <mathgroup at smc.vnet.net>
Data: Mercoled=EC 15 febbraio 2012, 21:08
I don't believe it is fixed in 8.0.4. I discovered the issue too
late for the change to make it in.
I can't stress enough that all of the tests based on the empirical
distribution function are marginal tests. A marginal-based test can
only be used to reject the null hypothesis of goodness of fit. It
does not provide adequate evidence for a good fit to the joint
distribution because dependency structure is missed. This is true
whether you are comparing data to a distribution or two data sets.
Here is a function that will compute marginal-based test statistics
and p-values more quickly than the Monte-Carlo based ones that are
in 8.0.4. I suggest you use this if such tests are adequate for
your purposes.
marginalMVTest[data1_, data2_, test_] :=
Block[{t, p, dim = Length[data1[[1]]]},
{t, p} =
Transpose[
Table[DistributionFitTest[data1[[All, i]], data2[[All, i]],
"TestData"], {i, dim}]];
{"Test" -> test, "T" -> Mean[t], "P-Value" ->
CDF[UniformSumDistribution[dim], Total@p]
]
As a first example take some data drawn from two very different
distributions.
data1 = data2 = RandomVariate[BinormalDistribution[.9], 100];
data2 = RandomVariate[BinormalDistribution[-.9], 100];
In[47]:= marginalMVTest[data1, data2, "KolmogorovSmirnov"]
Out[47]= {"Test" -> "KolmogorovSmirnov", "T" -> 0.0421055,
"P-Value" -> 0.917708}
Notice we would claim that we cannot reject the null hypothesis that
data1 and data2 were drawn from the same distribution. However, this
is obviously an error. The reason this happens is because
marginally, both distributions are NormalDistribution[0,1]. However,
if distributions differ marginally they also differ jointly. Thus
the following does what we want.
data1 = RandomVariate[ProductDistribution[NormalDistribution[],
NormalDistribution[2, 1]], 100];
data2 = RandomVariate[BinormalDistribution[0], 100];
In[79]:= marginalMVTest[data1, data2, "KolmogorovSmirnov"]
Out[79]= {"Test" -> "KolmogorovSmirnov", "T" -> 0.973367,
"P-Value" -> 0.00918411}
As I said in the previous response, the Szekely-Energy test in
DistributionFitTest will be a better comparison. You will just need
to wait a while for it to compute.
If you have two matrices data1 and data2 this can be accomplished
via
DistributionFitTest[data1, data2,
{"TestDataTable","SzekelyEnergy"},Method->{"MonteCarlo","MonteCarloSamples"->1000}]
I've shown this with sub-option "MonteCarloSamples". You can
increase this number to improve the estimate of the p-value. I warn
you though, 10^3 is already pretty slow.
Hopefully some of this is useful to you. Sorry for any inconvenience
this might have caused you.
-Andy
On 2/15/2012 1:13 PM, maria giovanna dainotti wrote:
Dear Andy Ross,
I have just got information of a new release Mathematica
8.0.4.
Do you know if the bug of Kolmogorv Smirnov in two
dimension is fixed in that version?
Unfortunately, for my purpose it is better to use a
statistical tools that make comparison directly with the
data not assuming any stastistic.
I would be very grateful if you could let me know
Best regards,
Maria
--- Mer 8/2/12, maria giovanna dainotti <mariagiovannadainotti at yahoo.it>
ha scritto:
Da: maria giovanna dainotti
<mariagiovannadainotti at yahoo.it>
Oggetto: I: Re: Kolmogorov Smirnov in two or
more dimensions
A:
Data: Mercoled=EC 8 febbraio 2012, 20:58
--- Mer 8/2/12, Andy Ross <andyr at wolfram.com>
ha scritto:
Da: Andy Ross <andyr at wolfram.com>
Oggetto: Re: Kolmogorov Smirnov
in two or more dimensions
A: "maria giovanna dainotti"
<mariagiovannadainotti at yahoo.it>
Cc: mathgroup at smc.vnet.net
Data: Mercoled=EC 8 febbraio 2012, 17:09
This is
a bug in KolmogorovSmirnovTest that will
be fixed in later versions of Mathematica.
For the time being you can use
CramerVonMisesTest or DistributionFitTest.
Note that DistributionFitTest uses a
little known test referred to as
Szekely-Energy. A description can be found
in...
Szekely, G. J. and Rizzo, M. L. (2004)
Testing for Equal Distributions in High
Dimension, InterStat,
November (5)
P-values for this test are computed via
Monte-Carlo simulation at the moment so it
tends to be slow.
The good thing about this test is that it
doesn't just look at things marginally.
All of the other tests currently available
perform tests on the marginal data and
then aggregate the statistics. Thus any
differences in covariance structure will
be missed.
The take home message is that if things
don't fit marginally then they don't fit
jointly but if they do seem to fit well
marginally you still need to dig deeper to
determine whether the joint distributions
are equivalent.
Andy Ross
Wolfram Research
On 2/8/2012 4:32 AM, maria giovanna
dainotti wrote:
> Dear Mathematica group,
> I am doing an analysis of the
Kolmogorov with two data sets of 2
dimensions each.
> When I apply the
KolmogorovSmirnovTest[Data0,Data0] I
should get 1. I did just a trial and I got
0.
> I am copying the datafile example for
clarity.
> {{0.97304, 14.1829}, {0.98663,
14.1295}, {0.98284, 14.3172}, {0.81423,
> 14.3466}, {0.97303, 14.5966},
{0.87122, 14.8435}, {0.90252,
> 14.9036}, {0.81887, 15.1177},
{1.07722, 14.6849}, {0.86684,
> 15.6456}, {0.86664, 14.7034},
{0.78728, 15.0898}, {1.10336,
> 14.4085}, {1.12014, 14.7281},
{0.95923, 14.4988}, {0.89942,
> 15.3173}, {0.83841, 14.8422},
{0.99105, 14.8813}, {1.111,
> 14.5964}, {0.93255, 15.5019},
{1.03142, 14.8009}, {1.00661,
> 15.0827}, {0.93255, 15.3064},
{1.10023, 14.7189}, {0.8797,
> 15.1038}, {1.0013, 14.6755},
{0.87673, 15.0952}, {0.84131,
> 15.7345}, {1.06392, 15.3528},
{1.00138, 14.9835}, {0.77803,
> 15.4637}, {0.76795, 15.3611},
{0.98328, 15.1047}, {0.89193,
> 14.8321}, {0.72882, 15.909},
{0.77123, 15.7902}, {0.86218,
> 15.6637}, {0.84381, 15.536},
{0.99263, 15.8903}, {1.0805,
> 15.1453}, {0.85316, 15.7793},
{0.85186, 15.9119}, {1.10898,
> 15.7583}, {1.03365, 15.7393},
{0.84783, 16.1911}, {1.10979,
> 16.0031}, {1.05238, 16.015},
{0.90259, 16.4864}, {0.84963,
> 15.9818}, {1.09221, 16.0088},
{1.0443, 15.8326}, {0.8945,
> 16.1927}, {0.83015, 16.2776},
{0.7551, 16.5538}, {1.05947,
> 15.9138}, {1.06189, 15.6061},
{1.05889, 16.0743}, {0.85216,
> 16.1568}, {0.72597, 16.7657},
{0.93638, 16.3583}, {0.81968,
> 16.2711}, {0.95022, 16.3549},
{1.04536, 16.18}, {1.0786,
> 16.0967}, {0.9385, 15.8504},
{0.95024, 15.9338}, {0.76753,
> 17.1461}, {1.18224, 16.0437},
{0.96447, 16.4908}, {0.98235,
> 16.0892}, {1.06151, 16.699},
{0.79052, 16.5207}, {1.15863,
> 16.3223}, {1.00795, 16.2444},
{1.07284, 16.6536}, {1.04796,
> 16.865}, {0.84226, 16.7247},
{1.04712, 16.2673}}
>
> Or maybe should I use some other
conditions?
> Or if it doesn't work is there an
already built package that I can use?
>
> Thanks a lot for your help
>
> Best regards,
> Maria