MathGroup Archive 2012

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Kolmogorov Smirnov in two or more dimensions is in Mathematica 8.0.4

  • To: mathgroup at smc.vnet.net
  • Subject: [mg125111] Re: Kolmogorov Smirnov in two or more dimensions is in Mathematica 8.0.4
  • From: maria giovanna dainotti <mariagiovannadainotti at yahoo.it>
  • Date: Wed, 22 Feb 2012 05:29:46 -0500 (EST)
  • Delivered-to: l-mathgroup@mail-archive0.wolfram.com

Dear Andy Ross,
thanks for your suggestion. I applied the method
DistributionFitTest[data1, data2,
{"TestDataTable","SzekelyEnergy"},Method->{"MonteCarlo","MonteCarloSamples"->1000}]
and it works, but when I try to minimize the function I have a problem.


For[j = 1, j < n + 1,
 fn = Function[{s, m},
   x = Table[{Data1[[j, 1]] + s, Data1[[j, 2]] + m}, {j, 1, size}];
   -2*Log[
     DistributionFitTest[x, Data0, {"TestDataTable", "SzekelyEnergy"},
        Method -> {"MonteCarlo", "MonteCarloSamples" -> 1000}][[1, 1,
      2, 3]]]
   ]]

With Data1
{{0.894606, 14.0865}, {0.811846, 13.7631}, {1.00287,
  14.2577}, {0.798426, 13.8605}, {0.765556, 14.0297}, {1.06602,
  14.144}, {0.965456, 14.2174}, {0.879326, 14.6311}, {0.831876,
  14.006}, {0.695166, 14.2763}, {1.05657, 14.2558}, {1.05519,
  13.7674}, {0.811626, 14.1137}, {0.910116, 13.9954}, {0.933376,
  13.9232}, {0.870476, 14.2965}, {0.713326, 14.7321}, {1.13929,
  13.9139}, {1.1878, 14.0376}, {0.998686, 13.9364}, {1.10588,
  14.0096}, {1.06213, 14.2319}, {0.797846, 14.9199}, {0.768916,
  14.5909}, {0.944476, 14.8957}, {1.09493, 14.6853}, {1.03171,
  14.5485}, {0.838416, 15.1119}, {1.03875, 14.3372}, {0.695036,
  15.0422}, {0.860336, 14.9023}, {0.903736, 15.1527}, {0.798196,
  14.7923}, {0.695146, 15.5481}, {1.08506, 15.2012}, {1.07127,
  15.2615}, {0.919576, 15.5636}, {1.01526, 15.4415}, {1.15206,
  15.0784}, {0.844826, 15.6197}, {0.897866, 15.887}, {0.762816,
  15.7318}, {0.867886, 15.743}, {0.991576, 15.7403}, {0.898346,
  15.6946}, {0.996826, 15.6467}, {1.0194, 15.6681}, {0.830686,
  15.5817}, {0.982806, 15.6863}, {0.883306, 16.3492}, {0.928786,
  15.9975}, {0.932786, 15.9091}, {1.08748, 15.8808}, {0.963906,
  16.4309}, {0.896236, 16.7413}, {0.750366, 16.7619}, {0.945936,
  16.6388}, {1.064, 16.7059}, {0.947046, 17.0593}, {1.0258,
  16.3197}, {0.973976, 16.9003}, {0.829206, 17.183}, {0.917956,
  17.0182}, {0.954016, 16.8625}, {0.842906, 17.2803}, {1.10261,
  17.0396}, {1.02155, 17.3819}, {0.671206, 17.5726}, {1.04596,
  17.076}, {1.08985, 17.3047}, {1.05809, 17.2071}, {0.916836,
  17.6237}, {0.815676, 17.5559}, {0.943866, 18.1943}, {1.02348,
  16.8333}, {1.00346, 16.9493}, {1.15636, 17.0909}, {1.11935,
  17.5003}}

and Data0

{{0.97304, 14.1829}, {0.98663, 14.1295}, {0.98284, 14.3172}, {0.81423,
   14.3466}, {0.97303, 14.5966}, {0.87122, 14.8435}, {0.90252,
  14.9036}, {0.81887, 15.1177}, {1.07722, 14.6849}, {0.86684,
  15.6456}, {0.86664, 14.7034}, {0.78728, 15.0898}, {1.10336,
  14.4085}, {1.12014, 14.7281}, {0.95923, 14.4988}, {0.89942,
  15.3173}, {0.83841, 14.8422}, {0.99105, 14.8813}, {1.111,
  14.5964}, {0.93255, 15.5019}, {1.03142, 14.8009}, {1.00661,
  15.0827}, {0.93255, 15.3064}, {1.10023, 14.7189}, {0.8797,
  15.1038}, {1.0013, 14.6755}, {0.87673, 15.0952}, {0.84131,
  15.7345}, {1.06392, 15.3528}, {1.00138, 14.9835}, {0.77803,
  15.4637}, {0.76795, 15.3611}, {0.98328, 15.1047}, {0.89193,
  14.8321}, {0.72882, 15.909}, {0.77123, 15.7902}, {0.86218,
  15.6637}, {0.84381, 15.536}, {0.99263, 15.8903}, {1.0805,
  15.1453}, {0.85316, 15.7793}, {0.85186, 15.9119}, {1.10898,
  15.7583}, {1.03365, 15.7393}, {0.84783, 16.1911}, {1.10979,
  16.0031}, {1.05238, 16.015}, {0.90259, 16.4864}, {0.84963,
  15.9818}, {1.09221, 16.0088}, {1.0443, 15.8326}, {0.8945,
  16.1927}, {0.83015, 16.2776}, {0.7551, 16.5538}, {1.05947,
  15.9138}, {1.06189, 15.6061}, {1.05889, 16.0743}, {0.85216,
  16.1568}, {0.72597, 16.7657}, {0.93638, 16.3583}, {0.81968,
  16.2711}, {0.95022, 16.3549}, {1.04536, 16.18}, {1.0786,
  16.0967}, {0.9385, 15.8504}, {0.95024, 15.9338}, {0.76753,
  17.1461}, {1.18224, 16.0437}, {0.96447, 16.4908}, {0.98235,
  16.0892}, {1.06151, 16.699}, {0.79052, 16.5207}, {1.15863,
  16.3223}, {1.00795, 16.2444}, {1.07284, 16.6536}, {1.04796,
  16.865}, {0.84226, 16.7247}, {1.04712, 16.2673}}


Namely it fails when I ask
bestsolNM = FindMinimum[{Function[{s, m},
    x = Table[{Data1[[j, 1]] + s, Data1[[j, 2]] + m}, {j, 1, size}];
    -2*Log[
      DistributionFitTest[x,
        Data0, {"TestDataTable", "SzekelyEnergy"},
        Method -> {"MonteCarlo", "MonteCarloSamples" -> 1000}][[1, 1,
       2, 3]]]
    ], -0.2 <= s <= 0.04, -0.2 <= m <= 0.11}, {s, 0.04}, {m, 0.02},
  MaxIterations -> 250, Method -> Automatic]

I did also this test in one dimension and it seems less stable than the KS.
Is there a way to make this test more stable?

Regarding the other test you suggested marginalMVTest should I use the
ProductDistribution for my data and a normal distribution?
Did I understand properly?

Thanks a lot,
Regards,
Maria

--- Mer 15/2/12, Andy Ross <andyr at wolfram.com> ha scritto:

Da: Andy Ross <andyr at wolfram.com>
Oggetto: Re: R: I: Re: Kolmogorov Smirnov in two or more dimensions is in Mathematica 8.0.4
A: "maria giovanna dainotti" <mariagiovannadainotti at yahoo.it>
Cc: "mathematica group" <mathgroup at smc.vnet.net>
Data: Mercoled=EC 15 febbraio 2012, 21:08


 

   
 
 
    I don't believe it is fixed in 8.0.4.  I discovered the issue too
    late for the change to make it in.

   

    I can't stress enough that all of the tests based on the empirical
    distribution function are marginal tests. A marginal-based test can
    only be used to reject the null hypothesis of goodness of fit.  It
    does not provide adequate evidence for a good fit to the joint
    distribution because dependency structure is missed. This is true
    whether you are comparing data to a distribution or two data sets.

   

    Here is a function that will compute marginal-based test statistics
    and p-values more quickly than the Monte-Carlo based ones that are
    in 8.0.4.  I suggest you use this if such tests are adequate for
    your purposes.

   

    marginalMVTest[data1_, data2_, test_] :=

     Block[{t, p, dim = Length[data1[[1]]]},

      {t, p} =

       Transpose[

        Table[DistributionFitTest[data1[[All, i]], data2[[All, i]],
    "TestData"], {i, dim}]];

      {"Test" -> test, "T" -> Mean[t], "P-Value" ->
    CDF[UniformSumDistribution[dim], Total@p]

      ]

   

    As a first example take some data drawn from two very different
    distributions.

   

    data1 = data2 = RandomVariate[BinormalDistribution[.9], 100];

    data2 = RandomVariate[BinormalDistribution[-.9], 100];

   

    In[47]:= marginalMVTest[data1, data2, "KolmogorovSmirnov"]

   

    Out[47]= {"Test" -> "KolmogorovSmirnov", "T" -> 0.0421055, 
    "P-Value" -> 0.917708}

   

    Notice we would claim that we cannot reject the null hypothesis that
    data1 and data2 were drawn from the same distribution. However, this
    is obviously an error.  The reason this happens is because
    marginally, both distributions are NormalDistribution[0,1]. However,
    if distributions differ marginally they also differ jointly.  Thus
    the following does what we want.

   

    data1 = RandomVariate[ProductDistribution[NormalDistribution[],
    NormalDistribution[2, 1]], 100];

     data2 = RandomVariate[BinormalDistribution[0], 100];

   

    In[79]:= marginalMVTest[data1, data2, "KolmogorovSmirnov"]

   

    Out[79]= {"Test" -> "KolmogorovSmirnov", "T" -> 0.973367,
    "P-Value" -> 0.00918411}

   

    As I said in the previous response, the Szekely-Energy test in
    DistributionFitTest will be a better comparison. You will just need
    to wait a while for it to compute. 

   

    If you have two matrices data1 and data2 this can be accomplished
    via

   

    DistributionFitTest[data1, data2,
{"TestDataTable","SzekelyEnergy"},Method->{"MonteCarlo","MonteCarloSamples"->1000}]

   

    I've shown this with sub-option "MonteCarloSamples".  You can
    increase this number to improve the estimate of the p-value.  I warn
    you though, 10^3 is already pretty slow.

   

    Hopefully some of this is useful to you. Sorry for any inconvenience
    this might have caused you.

   

    -Andy

   

    On 2/15/2012 1:13 PM, maria giovanna dainotti wrote:
   
     
       
         
            Dear Andy Ross,

              I have just got information of a new release Mathematica
              8.0.4.

              Do you know if the bug of Kolmogorv Smirnov in two
              dimension is fixed in that version?

              Unfortunately, for my purpose it is better to use a
              statistical tools that make comparison directly with the
              data not assuming any stastistic.

             

              I would be very grateful if you could let me know

              Best regards,

              Maria

             

             

              --- Mer 8/2/12, maria giovanna dainotti <mariagiovannadainotti at yahoo.it>
              ha scritto:

             

                Da: maria giovanna dainotti
                <mariagiovannadainotti at yahoo.it>

                Oggetto: I: Re: Kolmogorov Smirnov in two or
                more dimensions

                A:

                Data: Mercoled=EC 8 febbraio 2012, 20:58

               

               
                 
                   
                     
                       

                         

                          --- Mer 8/2/12, Andy Ross <andyr at wolfram.com>
                          ha scritto:

                         

                            Da: Andy Ross <andyr at wolfram.com>

                            Oggetto: Re: Kolmogorov Smirnov
                            in two or more dimensions

                            A: "maria giovanna dainotti"
                            <mariagiovannadainotti at yahoo.it>

                            Cc: mathgroup at smc.vnet.net

                            Data: Mercoled=EC 8 febbraio 2012, 17:09

                           

                            This is
                              a bug in KolmogorovSmirnovTest that will
                              be fixed in later versions of Mathematica.
                              For the time being you can use
                              CramerVonMisesTest or DistributionFitTest.

                             

                              Note that DistributionFitTest uses a
                              little known test referred to as
                              Szekely-Energy. A description can be found
                              in...

                             

                              Szekely, G. J. and Rizzo, M. L. (2004)
                              Testing for Equal Distributions in High
                              Dimension, InterStat,

                              November (5)

                             

                              P-values for this test are computed via
                              Monte-Carlo simulation at the moment so it
                              tends to be slow.

                             

                              The good thing about this test is that it
                              doesn't just look at things marginally. 
                              All of the other tests currently available
                              perform tests on the marginal data and
                              then aggregate the statistics. Thus any
                              differences in covariance structure will
                              be missed.

                             

                              The take home message is that if things
                              don't fit marginally then they don't fit
                              jointly but if they do seem to fit well
                              marginally you still need to dig deeper to
                              determine whether the joint distributions
                              are equivalent.

                             

                              Andy Ross

                              Wolfram Research

                             

                              On 2/8/2012 4:32 AM, maria giovanna
                              dainotti wrote:

                              > Dear Mathematica group,

                              > I am doing an analysis of the
                              Kolmogorov with two data sets of 2
                              dimensions each.

                              > When I apply the
                              KolmogorovSmirnovTest[Data0,Data0] I
                              should get 1. I did just a trial and I got
                              0.

                              > I am copying the datafile example for
                              clarity.

                              > {{0.97304, 14.1829}, {0.98663,
                              14.1295}, {0.98284, 14.3172}, {0.81423,

                              >     14.3466}, {0.97303, 14.5966},
                              {0.87122, 14.8435}, {0.90252,

                              >    14.9036}, {0.81887, 15.1177},
                              {1.07722, 14.6849}, {0.86684,

                              >    15.6456}, {0.86664, 14.7034},
                              {0.78728, 15.0898}, {1.10336,

                              >    14.4085}, {1.12014, 14.7281},
                              {0.95923, 14.4988}, {0.89942,

                              >    15.3173}, {0.83841, 14.8422},
                              {0.99105, 14.8813}, {1.111,

                              >    14.5964}, {0.93255, 15.5019},
                              {1.03142, 14.8009}, {1.00661,

                              >    15.0827}, {0.93255, 15.3064},
                              {1.10023, 14.7189}, {0.8797,

                              >    15.1038}, {1.0013, 14.6755},
                              {0.87673, 15.0952}, {0.84131,

                              >    15.7345}, {1.06392, 15.3528},
                              {1.00138, 14.9835}, {0.77803,

                              >    15.4637}, {0.76795, 15.3611},
                              {0.98328, 15.1047}, {0.89193,

                              >    14.8321}, {0.72882, 15.909},
                              {0.77123, 15.7902}, {0.86218,

                              >    15.6637}, {0.84381, 15.536},
                              {0.99263, 15.8903}, {1.0805,

                              >    15.1453}, {0.85316, 15.7793},
                              {0.85186, 15.9119}, {1.10898,

                              >    15.7583}, {1.03365, 15.7393},
                              {0.84783, 16.1911}, {1.10979,

                              >    16.0031}, {1.05238, 16.015},
                              {0.90259, 16.4864}, {0.84963,

                              >    15.9818}, {1.09221, 16.0088},
                              {1.0443, 15.8326}, {0.8945,

                              >    16.1927}, {0.83015, 16.2776},
                              {0.7551, 16.5538}, {1.05947,

                              >    15.9138}, {1.06189, 15.6061},
                              {1.05889, 16.0743}, {0.85216,

                              >    16.1568}, {0.72597, 16.7657},
                              {0.93638, 16.3583}, {0.81968,

                              >    16.2711}, {0.95022, 16.3549},
                              {1.04536, 16.18}, {1.0786,

                              >    16.0967}, {0.9385, 15.8504},
                              {0.95024, 15.9338}, {0.76753,

                              >    17.1461}, {1.18224, 16.0437},
                              {0.96447, 16.4908}, {0.98235,

                              >    16.0892}, {1.06151, 16.699},
                              {0.79052, 16.5207}, {1.15863,

                              >    16.3223}, {1.00795, 16.2444},
                              {1.07284, 16.6536}, {1.04796,

                              >    16.865}, {0.84226, 16.7247},
                              {1.04712, 16.2673}}

                              >

                              > Or maybe should I use some other
                              conditions?

                              > Or if it doesn't work is there an
                              already built package that I can use?

                              >

                              > Thanks a lot for your help

                              >

                              > Best regards,

                              > Maria


  • Prev by Date: Re: How best to implement a hash table in Mathematica
  • Next by Date: Re: good list
  • Previous by thread: Re: How I can I optimize the following code in order to get very
  • Next by thread: Re: Kolmogorov Smirnov in two or more dimensions is in Mathematica 8.0.4