Re: Empirical CDF and InterpolatingFunction

*To*: mathgroup at smc.vnet.net*Subject*: [mg36579] Re: [mg36555] Empirical CDF and InterpolatingFunction*From*: Tomas Garza <tgarza01 at prodigy.net.mx>*Date*: Fri, 13 Sep 2002 01:13:42 -0400 (EDT)*References*: <200209111727.NAA07812@smc.vnet.net>*Sender*: owner-wri-mathgroup at wolfram.com

Well, I don't know how fast, but it is fairly simple, anyway. Suppose you have a series of values s for which you wish to obtain the edf. In[1]:= s = Table[Random[Integer, {0, 3}], {10}] Out[1]= {2,2,1,2,0,0,1,1,1,3} If no specification is made about their position on the x-axis, we assume that they correspond to the integers from 1 to 10. What we have then is the collection of pairs In[2]:= porig = Transpose[{Range[10], s}] Out[2]= {{1, 2}, {2, 2}, {3, 1}, {4, 2}, {5, 0}, {6, 0}, {7, 1}, {8, 1}, {9, 1}, {10, 3}} The edf gives, for each x, the proportion of points in s that are less than or equal to x, for all x. We obtain these proportions through the cumulative sums In[3]:= N[CumulativeSums[s]/Plus @@ s] Out[3]= {0.153846,0.307692,0.384615,0.538462,0.538462,0.538462,0.615385,0.692308,0.\ 769231,1.} so that for each of the pairs (x, y) below, y gives the proportion of points in s that are less than or equal to x: In[4]:= cumporig = Transpose[{Range[10], N[CumulativeSums[s]/Plus @@ s]}] Out[4]= {{1, 0.15384615384615385}, {2, 0.3076923076923077}, {3, 0.38461538461538464}, {4, 0.5384615384615384}, {5, 0.5384615384615384}, {6, 0.5384615384615384}, {7, 0.6153846153846154}, {8, 0.6923076923076923}, {9, 0.7692307692307693}, {10, 1.}} Now shift the x values one unit to the left, by dropping the last value and prepending 0 to them: In[5]:= ps = Transpose[{Prepend[Drop[Range[1, 10], -1], 0], CumulativeSums[s]/Plus @@ s}] Out[5]= {{0, 2/13}, {1, 4/13}, {2, 5/13}, {3, 7/13}, {4, 7/13}, {5, 7/13}, {6, 8/13}, {7, 9/13}, {8, 10/13}, {9, 1}} Then use Interpolation on this shifted set of points: In[6]:= ips=Interpolation[ps,InterpolationOrder\[Rule]0] Out[6]= InterpolatingFunction[{{0,9}},<>] ips[x-1] is the edf you are looking for, as you may check by plotting it and displaying in the same graph together with the ListPlot of cumporig above. Tomas Garza Mexico City ----- Original Message ----- From: "Mark Fisher" <mark at markfisher.net> To: mathgroup at smc.vnet.net Subject: [mg36579] [mg36555] Empirical CDF and InterpolatingFunction > I'm trying to write a fast empirical cummulative distribution function > (CDF). Empirical CDFs are step functions that can be expressed in > terms of a Which statement. For example, given the list of > observations {1, 2, 3}, > > f = Which[# < 1, 0, # < 2, 1/3, # < 3, 2/3, True, 1]& > > is the empirical CDF. Note that f /@ {1, 2, 3} returns {1/3, 2/3, 1} > and f is continuous from the right. > > When the number of observations is large, the Which statement > evaluates fairly slowly (even if it has been Compiled). Since > InterpolationFunction evaluates so much faster in general, I've tried > to use Interpolation with InterpolationOrder -> 0. The problem is that > the resulting InterpolatingFunction doesn't behave the way (I think) > it ought to. For example, let > > g = Interpolation[{{1, 1/3}, {2, 2/3}, {3, 1}}, InterpolationOrder -> > 0] > > Then, g /@ {1, 2, 3} returns {2/3, 2/3, 1} instead of {1/3, 2/3, 1}. > In addition, g is continuous from the left rather than from the right. > > Obviously I am not aware of the considerations that went into > determining the behavior of InterpolationFunction when > InterpolationOrder -> 0. > > So I have two questions: > > (1) Does anyone have any opinions about how InterpolatingFunction > ought to behave with InterpolationOrder -> 0? > > (2) Does anyone have a faster way to evaluate an empirical CDF than a > compiled Which function? > > By the way, here's my current version: > > CompileEmpiricalCDF[list_?(VectorQ[#, NumericQ] &)] := > Block[{x}, Compile[{{x, _Real}}, Evaluate[ > Which @@ Flatten[ > Append[ > Transpose[{ > Thread[x < Sort[list]], > Range[0, 1 - 1/#, 1/#] & @ Length[list] > }], > {True, 1}]] > ]]] > > --Mark > >

**References**:**Empirical CDF and InterpolatingFunction***From:*mark@markfisher.net (Mark Fisher)