Services & Resources / Wolfram Forums
MathGroup Archive
*Archive Index
*Ask about this page
*Print this page
*Give us feedback
*Sign up for the Wolfram Insider

MathGroup Archive 2002

[Date Index] [Thread Index] [Author Index]

Search the Archive

FW: Re: empirical CDF

  • To: mathgroup at
  • Subject: [mg36664] FW: [mg36643] Re: [mg36619] empirical CDF
  • From: Blimbaum Jerry DLPC <BlimbaumJE at>
  • Date: Wed, 18 Sep 2002 02:09:46 -0400 (EDT)
  • Sender: owner-wri-mathgroup at

There is a very nice java applet at   , in which you can
include your own data by replacing what is in the Applet with your own,
which gives you a real time histogram plot and lets you alter the bin width
and see how this effects the wasnt until I saw this that i
understood the significance of choosing the bin width......jerry blimbaum

-----Original Message-----
From: Bill Rowe [mailto:listuser at]
To: mathgroup at
Subject: [mg36664] [mg36643] Re: [mg36619] empirical CDF

On 9/13/02 at 11:33 PM, swidrygiello at (Swidrygiello) wrote:

>Does anybody know how to calculate in Mathematica: 
>a)empirical CDF,
>b)empirical PDF, 
>c)normal QQ-plot; 
>d)QQ-plot two different random samples?!

Yes, but there are a number of issues particularly with an empirical PDF. A
very nice package that does all of the above and more is mathStatica. See for details.

Obviously, it is less expensive to write your own functions.

Just recently in message [mg36613] Mark Fisher posted code that addresses
the empirical CDF. However, in this code you may want to replace 1/n with
1/(n+1) or (j-0.5)/n depending on your application. Note, these will have no
significant effect for large data sets.

The key issue with an empirical PDF is deciding the bin width. A simple
approach would be to use the functions in Statistics`DataManipulation` and
Graphics`Graphics`. Look at the functions Histogram, Frequencies and
BinListCounts. More sophisticated approaches involve kernel methods. These
methods will generate smoother estimates for the PDF. Again, the key is
bandwidth. There is no apriori choice for bin width or bandwith. Bad choices
will obscure significant features in the data set.

  • Prev by Date: Re: Checking Programming errors; a ?
  • Next by Date: Re: RE: Re: Why is my Implementation of Sorted Trees So Slow?
  • Previous by thread: empirical CDF
  • Next by thread: RE: FW: Re: empirical CDF