MathGroup Archive: September 2002 [00296]

[Date Index] [Thread Index] [Author Index]

Re: Empirical CDF and InterpolatingFunction

To: mathgroup at smc.vnet.net
Subject: [mg36613] Re: Empirical CDF and InterpolatingFunction
From: mark at markfisher.net (Mark Fisher)
Date: Fri, 13 Sep 2002 23:33:13 -0400 (EDT)
References: <alnv9e$7op$1@smc.vnet.net>
Sender: owner-wri-mathgroup at wolfram.com

Daniel Lichtblau made two suggestions that allow one to use
Interpolation the way I wanted. First, to make the resulting function
right-continuous, change the sign twice. Second, to make the end point
return the correct value, add an extra (phantom) observation (with an
extra (irrelevant) value). (The phantom observation is made at the
high end because of the sign reversals.) Here's the code I cooked up
based on his suggestions:

MakeEmpiricalCDF::usage = "MakeEmpiricalCDF[list] returns a function
that evaluates the empirical CDF given the observations in the list.
The function is defined on the entire real line."

MakeEmpiricalCDF[list_?(VectorQ[#, NumericQ]&)] :=
  Module[{n, s, a, r, idata},
  n = Length[list];
  s = Sort[list];
  a = Append[s, s[[-1]] + 1]; (* phantom obs. *)
  r = Range[1/n, 1 + 1/n, 1/n]; (* phantom value 1 + 1/n *)
  idata = Last /@ Split[Transpose[{-a, r}], #1[[1]] == #2[[1]]&];
              (* "-a" is the first sign change *)
  Block[{x},
    Function @@ {x, Which @@ {
      x < s[[ 1]], 0., 
      x > s[[-1]], 1., 
      True, Interpolation[idata, InterpolationOrder -> 0][-x]
              (* "-x" is the second sign change *)
    }}]
  ]

The construction "Last /@ Split[ ... ]" accounts for duplicate values.

Here are two examples. 

Needs["Statistics`ContinuousDistributions`"]
list1 = RandomArray[NormalDistribution[0, 1], 100];
f1 = MakeEmpiricalCDF[list1];
Plot[f1[x], {x, -4, 4}]

list2 = Table[Random[Integer, {1, 10}], {10}];
f2 = MakeEmpiricalCDF[list2];
Plot[f2[x], {x, 0, 11}]

--Mark

mark at markfisher.net (Mark Fisher) wrote in message news:<alnv9e$7op$1 at smc.vnet.net>...
> I'm trying to write a fast empirical cummulative distribution function
> (CDF). Empirical CDFs are step functions that can be expressed in
> terms of a Which statement. For example, given the list of
> observations {1, 2, 3},
> 
> f = Which[# < 1, 0, # < 2, 1/3, # < 3, 2/3, True, 1]&
> 
> is the empirical CDF. Note that f /@ {1, 2, 3} returns {1/3, 2/3, 1}
> and f is continuous from the right.
> 
> When the number of observations is large, the Which statement
> evaluates fairly slowly (even if it has been Compiled). Since
> InterpolationFunction evaluates so much faster in general, I've tried
> to use Interpolation with InterpolationOrder -> 0. The problem is that
> the resulting InterpolatingFunction doesn't behave the way (I think)
> it ought to. For example, let
> 
> g = Interpolation[{{1, 1/3}, {2, 2/3}, {3, 1}}, InterpolationOrder ->
> 0]
> 
> Then, g /@ {1, 2, 3} returns {2/3, 2/3, 1} instead of {1/3, 2/3, 1}.
> In addition, g is continuous from the left rather than from the right.
> 
> Obviously I am not aware of the considerations that went into
> determining the behavior of InterpolationFunction when
> InterpolationOrder -> 0.
> 
> So I have two questions: 
> 
> (1) Does anyone have any opinions about how InterpolatingFunction
> ought to behave with InterpolationOrder -> 0?
> 
> (2) Does anyone have a faster way to evaluate an empirical CDF than a
> compiled Which function?
> 
> By the way, here's my current version:
> 
> CompileEmpiricalCDF[list_?(VectorQ[#, NumericQ] &)] :=
>   Block[{x}, Compile[{{x, _Real}}, Evaluate[
>     Which @@ Flatten[
>       Append[
>           Transpose[{
>             Thread[x < Sort[list]],
>             Range[0, 1 - 1/#, 1/#] & @ Length[list]
>               }],
>         {True, 1}]]
>   ]]]
> 
> --Mark

Prev by Date: Re:trying to pull numbers out of a string from a file...

Next by Date: Invert a function

Previous by thread: RE: Empirical CDF and InterpolatingFunction

Next by thread: 3D chart from ASCII