MathGroup Archive 2003

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: WeibullDistribution

  • To: mathgroup at smc.vnet.net
  • Subject: [mg42635] Re: WeibullDistribution
  • From: Bill Rowe <listuser at earthlink.net>
  • Date: Fri, 18 Jul 2003 05:25:36 -0400 (EDT)
  • Sender: owner-wri-mathgroup at wolfram.com

On 7/17/03 at 11:11 AM, drbob at bigfoot.com (Dr Bob) wrote:

> I would think the CDF -- anchored as it is at its end-points -- would 
> obscure things.

I don't understand what you mean by "anchored". It is true the the theorectical CDF you are trying to fit is monotonic over a fixed range of 0 to 1.  But you are fitting the empirical CDF which ranges over a smaller interval strictly determined by the number of points in the data set. So, I don't see choosing to fit the CDF as "anchoring" the data.

But if you point is it there is a greater visual difference between PDFs of different distributions than CDFs, I agree. Also, I suspect the most people are more familiar with the PDF than they are for the CDF. So, for presentation, I would most likely use the PDF. But for fitting I much prefer the CDF.

First, there is the issue of deciding what bin width or bandwidth to use for the PDF. While there are various rules of thumb, there is no a priori correct choice. Different choices here lead to significantly different appearances of the plot and often significantly different assessments of the data. And since there is no a priori data driven choice, it is difficult to ensure my expectations are not too heavily biasing the choice. Using the CDF avoids this problem.

Next, the PDF is the derivative of the CDF. The point being you are essentially doing a differencing operation when computing the emprical PDF which tends to accentuate uncertainty in the data points. Of course, this can be overcome by the choice of the bandwidth parameter. OTOH, the CDF is essentially a summing operation which tends to suppress random uncertainty in the data points.

Note, in making the comments above I am assuming data from a continuous distribution rather than a discrete distribution. For discrete distributions there is a well defined natural bin width.

Finally, when the data comes from a continuous distribution the question you usually want answered is the probability the data is less than or greater than some value of interest. That question is answered by the CDF not the PDF.

> Fitting it will make a Chi-square goodness of fit look good, but perhaps at 
> the cost of really fitting the shape of the PDF.

If you have adequate data and are fitting the correct distribution to the data, you should get no significant difference in the estimated parameters whether you choose to fit the PDF or the CDF. If you are getting significant differences that is most likely an indication of a problem somewhere. Possibly bad choice for the bin width/bandwidth, bad choice for the theorectical distribution, not enough data etc.


  • Prev by Date: Re: Re: WeibullDistribution
  • Next by Date: Re: Basic Stat Question
  • Previous by thread: Re: WeibullDistribution
  • Next by thread: Re: WeibullDistribution