Re: Empirical CDF and InterpolatingFunction
- To: mathgroup at smc.vnet.net
- Subject: [mg36585] Re: [mg36555] Empirical CDF and InterpolatingFunction
- From: "Johannes Ludsteck" <johannes.ludsteck at wiwi.uni-regensburg.de>
- Date: Fri, 13 Sep 2002 01:13:52 -0400 (EDT)
- Organization: Universitaet Regensburg
- Sender: owner-wri-mathgroup at wolfram.com
Dear Mark,
I suggest trying the following code before you put
further energy in speeding up your functions. My
code is very short and seems to be fast in my first
test with a random array of 100000 integers.
cdf[li_List]:=
FoldList[#1+Length[#2]&, 0.0,
Split[Sort[li]]]/Length[li]
t=Table[Random[Integer,{1,1000}],{100000}];
Timing[cdf[t];]
{0.44 Second,Null}
Probably the speed could be increased further generating
a compiled version. But this would require additional
programming effort. Then you had to write a function which
counts the number of
equal data 'by hand' while scanning through the data list.
Best regards,
Johannes
On 11 Sep 2002, at 13:27, Mark Fisher wrote:
> I'm trying to write a fast empirical cummulative distribution function
> (CDF). Empirical CDFs are step functions that can be expressed in
> terms of a Which statement. For example, given the list of
> observations {1, 2, 3},
>
> f = Which[# < 1, 0, # < 2, 1/3, # < 3, 2/3, True, 1]&
>
> is the empirical CDF. Note that f /@ {1, 2, 3} returns {1/3, 2/3, 1}
> and f is continuous from the right.
>
> When the number of observations is large, the Which statement
> evaluates fairly slowly (even if it has been Compiled). Since
> InterpolationFunction evaluates so much faster in general, I've tried
> to use Interpolation with InterpolationOrder -> 0. The problem is that
> the resulting InterpolatingFunction doesn't behave the way (I think)
> it ought to. For example, let
>
> g = Interpolation[{{1, 1/3}, {2, 2/3}, {3, 1}}, InterpolationOrder ->
> 0]
>
> Then, g /@ {1, 2, 3} returns {2/3, 2/3, 1} instead of {1/3, 2/3, 1}.
> In addition, g is continuous from the left rather than from the right.
>
> Obviously I am not aware of the considerations that went into
> determining the behavior of InterpolationFunction when
> InterpolationOrder -> 0.
>
> So I have two questions:
>
> (1) Does anyone have any opinions about how InterpolatingFunction
> ought to behave with InterpolationOrder -> 0?
>
> (2) Does anyone have a faster way to evaluate an empirical CDF than a
> compiled Which function?
>
> By the way, here's my current version:
>
> CompileEmpiricalCDF[list_?(VectorQ[#, NumericQ] &)] :=
> Block[{x}, Compile[{{x, _Real}}, Evaluate[
> Which @@ Flatten[
> Append[
> Transpose[{
> Thread[x < Sort[list]],
> Range[0, 1 - 1/#, 1/#] & @ Length[list]
> }],
> {True, 1}]]
> ]]]
>
> --Mark
>
<><><><><><><><><><><><>
Johannes Ludsteck
Economics Department
University of Regensburg
Universitaetsstrasse 31
93053 Regensburg
Phone +49/0941/943-2741