       Re: Empirical CDF and InterpolatingFunction

• To: mathgroup at smc.vnet.net
• Subject: [mg36579] Re: [mg36555] Empirical CDF and InterpolatingFunction
• From: Tomas Garza <tgarza01 at prodigy.net.mx>
• Date: Fri, 13 Sep 2002 01:13:42 -0400 (EDT)
• References: <200209111727.NAA07812@smc.vnet.net>
• Sender: owner-wri-mathgroup at wolfram.com

```Well, I don't know how fast, but it is fairly simple, anyway. Suppose you
have a series of values s for which you wish to obtain the edf.

In:=
s = Table[Random[Integer, {0, 3}], {10}]
Out=
{2,2,1,2,0,0,1,1,1,3}

If no specification is made about their position on the x-axis, we assume
that they correspond to the integers from 1 to 10. What we have then is the
collection of pairs

In:=
porig = Transpose[{Range, s}]
Out=
{{1, 2}, {2, 2}, {3, 1}, {4, 2}, {5, 0}, {6, 0}, {7, 1},
{8, 1}, {9, 1}, {10, 3}}

The edf gives, for each x, the proportion of points in s that are less than
or equal to x, for all x. We obtain these proportions through the cumulative
sums

In:=
N[CumulativeSums[s]/Plus @@ s]
Out=
{0.153846,0.307692,0.384615,0.538462,0.538462,0.538462,0.615385,0.692308,0.\
769231,1.}

so that for each of the pairs (x, y) below, y gives the proportion of points
in s that are less than or equal to x:

In:=
cumporig = Transpose[{Range,
N[CumulativeSums[s]/Plus @@ s]}]
Out=
{{1, 0.15384615384615385}, {2, 0.3076923076923077},
{3, 0.38461538461538464}, {4, 0.5384615384615384},
{5, 0.5384615384615384}, {6, 0.5384615384615384},
{7, 0.6153846153846154}, {8, 0.6923076923076923},
{9, 0.7692307692307693}, {10, 1.}}

Now shift the x values one unit to the left, by dropping the last value and
prepending 0 to them:

In:=
ps = Transpose[{Prepend[Drop[Range[1, 10], -1], 0],
CumulativeSums[s]/Plus @@ s}]
Out=
{{0, 2/13}, {1, 4/13}, {2, 5/13}, {3, 7/13}, {4, 7/13},
{5, 7/13}, {6, 8/13}, {7, 9/13}, {8, 10/13}, {9, 1}}

Then use Interpolation on this shifted set of points:

In:=
ips=Interpolation[ps,InterpolationOrder\[Rule]0]
Out=
InterpolatingFunction[{{0,9}},<>]

ips[x-1] is the edf you are looking for, as you may check by plotting it and
displaying in the same graph together with the ListPlot of cumporig above.

Tomas Garza
Mexico City

----- Original Message -----
From: "Mark Fisher" <mark at markfisher.net>
To: mathgroup at smc.vnet.net
Subject: [mg36579] [mg36555] Empirical CDF and InterpolatingFunction

> I'm trying to write a fast empirical cummulative distribution function
> (CDF). Empirical CDFs are step functions that can be expressed in
> terms of a Which statement. For example, given the list of
> observations {1, 2, 3},
>
> f = Which[# < 1, 0, # < 2, 1/3, # < 3, 2/3, True, 1]&
>
> is the empirical CDF. Note that f /@ {1, 2, 3} returns {1/3, 2/3, 1}
> and f is continuous from the right.
>
> When the number of observations is large, the Which statement
> evaluates fairly slowly (even if it has been Compiled). Since
> InterpolationFunction evaluates so much faster in general, I've tried
> to use Interpolation with InterpolationOrder -> 0. The problem is that
> the resulting InterpolatingFunction doesn't behave the way (I think)
> it ought to. For example, let
>
> g = Interpolation[{{1, 1/3}, {2, 2/3}, {3, 1}}, InterpolationOrder ->
> 0]
>
> Then, g /@ {1, 2, 3} returns {2/3, 2/3, 1} instead of {1/3, 2/3, 1}.
> In addition, g is continuous from the left rather than from the right.
>
> Obviously I am not aware of the considerations that went into
> determining the behavior of InterpolationFunction when
> InterpolationOrder -> 0.
>
> So I have two questions:
>
> (1) Does anyone have any opinions about how InterpolatingFunction
> ought to behave with InterpolationOrder -> 0?
>
> (2) Does anyone have a faster way to evaluate an empirical CDF than a
> compiled Which function?
>
> By the way, here's my current version:
>
> CompileEmpiricalCDF[list_?(VectorQ[#, NumericQ] &)] :=
>   Block[{x}, Compile[{{x, _Real}}, Evaluate[
>     Which @@ Flatten[
>       Append[
>           Transpose[{
>             Thread[x < Sort[list]],
>             Range[0, 1 - 1/#, 1/#] & @ Length[list]
>               }],
>         {True, 1}]]
>   ]]]
>
> --Mark
>
>

```

• Prev by Date: RE: Uneven FrameTicks with ExtendGraphics
• Next by Date: Re: creating adjacency matrices
• Previous by thread: Empirical CDF and InterpolatingFunction
• Next by thread: RE: Empirical CDF and InterpolatingFunction