RE: Empirical CDF and InterpolatingFunction

*To*: mathgroup at smc.vnet.net*Subject*: [mg36577] RE: [mg36555] Empirical CDF and InterpolatingFunction*From*: "DrBob" <drbob at bigfoot.com>*Date*: Fri, 13 Sep 2002 01:13:39 -0400 (EDT)*Reply-to*: <drbob at bigfoot.com>*Sender*: owner-wri-mathgroup at wolfram.com

I have no opinion on the behavior associated with InterpolationOrder->0, except that it should be DOCUMENTED, but isn't. Meanwhile, try this: lst = {{1, 1/3}, {2, 2/3}, {3, 1}}; ClearAll[empiricalCDF] empiricalCDF[{x_List, y_List}] := Compile[ {{z, _Real}}, Evaluate[ Which @@ Flatten[ Transpose[ {(z < #1 & ) /@ x, y}]]]] empiricalCDF[v:{{_, _}..}] := empiricalCDF[v] = empiricalCDF[ ({Join[#1[[1]], {Infinity}], Join[{0}, #1[[ 2]]]} & )[Transpose[ lst]]] empiricalCDF[lst] Plot[empiricalCDF[lst][x], {x, 1, 3}]; I split the work into two definitions for readability. If you evaluate: ?empiricalCDF after doing the plot above, you'll see that empiricalCDF[{{1, 1/3}, {2, 2/3}, {3, 1}}] has been compiled and saved for later use, and that it takes precedence over the SetDelayed rules listed after it. Hence, the compilation only occurs once for each list. As written, your CompileEmpiricalCDF would be compiled all over again every time it's used -- for each and every point you plot -- so of course it's slow. There are other problems, too. For instance, the pattern list_?(VectorQ[#, NumericQ] &) makes no sense at all. Maybe you meant list_?And[VectorQ[#],NumericQ[#]]& Bobby Treat -----Original Message----- From: Mark Fisher [mailto:mark at markfisher.net] To: mathgroup at smc.vnet.net Subject: [mg36577] [mg36555] Empirical CDF and InterpolatingFunction I'm trying to write a fast empirical cummulative distribution function (CDF). Empirical CDFs are step functions that can be expressed in terms of a Which statement. For example, given the list of observations {1, 2, 3}, f = Which[# < 1, 0, # < 2, 1/3, # < 3, 2/3, True, 1]& is the empirical CDF. Note that f /@ {1, 2, 3} returns {1/3, 2/3, 1} and f is continuous from the right. When the number of observations is large, the Which statement evaluates fairly slowly (even if it has been Compiled). Since InterpolationFunction evaluates so much faster in general, I've tried to use Interpolation with InterpolationOrder -> 0. The problem is that the resulting InterpolatingFunction doesn't behave the way (I think) it ought to. For example, let g = Interpolation[{{1, 1/3}, {2, 2/3}, {3, 1}}, InterpolationOrder -> 0] Then, g /@ {1, 2, 3} returns {2/3, 2/3, 1} instead of {1/3, 2/3, 1}. In addition, g is continuous from the left rather than from the right. Obviously I am not aware of the considerations that went into determining the behavior of InterpolationFunction when InterpolationOrder -> 0. So I have two questions: (1) Does anyone have any opinions about how InterpolatingFunction ought to behave with InterpolationOrder -> 0? (2) Does anyone have a faster way to evaluate an empirical CDF than a compiled Which function? By the way, here's my current version: CompileEmpiricalCDF[list_?(VectorQ[#, NumericQ] &)] := Block[{x}, Compile[{{x, _Real}}, Evaluate[ Which @@ Flatten[ Append[ Transpose[{ Thread[x < Sort[list]], Range[0, 1 - 1/#, 1/#] & @ Length[list] }], {True, 1}]] ]]] --Mark