Re: Empirical CDF and InterpolatingFunction
- To: mathgroup at smc.vnet.net
- Subject: [mg36585] Re: [mg36555] Empirical CDF and InterpolatingFunction
- From: "Johannes Ludsteck" <johannes.ludsteck at wiwi.uni-regensburg.de>
- Date: Fri, 13 Sep 2002 01:13:52 -0400 (EDT)
- Organization: Universitaet Regensburg
- Sender: owner-wri-mathgroup at wolfram.com
Dear Mark, I suggest trying the following code before you put further energy in speeding up your functions. My code is very short and seems to be fast in my first test with a random array of 100000 integers. cdf[li_List]:= FoldList[#1+Length[#2]&, 0.0, Split[Sort[li]]]/Length[li] t=Table[Random[Integer,{1,1000}],{100000}]; Timing[cdf[t];] {0.44 Second,Null} Probably the speed could be increased further generating a compiled version. But this would require additional programming effort. Then you had to write a function which counts the number of equal data 'by hand' while scanning through the data list. Best regards, Johannes On 11 Sep 2002, at 13:27, Mark Fisher wrote: > I'm trying to write a fast empirical cummulative distribution function > (CDF). Empirical CDFs are step functions that can be expressed in > terms of a Which statement. For example, given the list of > observations {1, 2, 3}, > > f = Which[# < 1, 0, # < 2, 1/3, # < 3, 2/3, True, 1]& > > is the empirical CDF. Note that f /@ {1, 2, 3} returns {1/3, 2/3, 1} > and f is continuous from the right. > > When the number of observations is large, the Which statement > evaluates fairly slowly (even if it has been Compiled). Since > InterpolationFunction evaluates so much faster in general, I've tried > to use Interpolation with InterpolationOrder -> 0. The problem is that > the resulting InterpolatingFunction doesn't behave the way (I think) > it ought to. For example, let > > g = Interpolation[{{1, 1/3}, {2, 2/3}, {3, 1}}, InterpolationOrder -> > 0] > > Then, g /@ {1, 2, 3} returns {2/3, 2/3, 1} instead of {1/3, 2/3, 1}. > In addition, g is continuous from the left rather than from the right. > > Obviously I am not aware of the considerations that went into > determining the behavior of InterpolationFunction when > InterpolationOrder -> 0. > > So I have two questions: > > (1) Does anyone have any opinions about how InterpolatingFunction > ought to behave with InterpolationOrder -> 0? > > (2) Does anyone have a faster way to evaluate an empirical CDF than a > compiled Which function? > > By the way, here's my current version: > > CompileEmpiricalCDF[list_?(VectorQ[#, NumericQ] &)] := > Block[{x}, Compile[{{x, _Real}}, Evaluate[ > Which @@ Flatten[ > Append[ > Transpose[{ > Thread[x < Sort[list]], > Range[0, 1 - 1/#, 1/#] & @ Length[list] > }], > {True, 1}]] > ]]] > > --Mark > <><><><><><><><><><><><> Johannes Ludsteck Economics Department University of Regensburg Universitaetsstrasse 31 93053 Regensburg Phone +49/0941/943-2741