 
 
 
 
 
 
Re: WeibullDistribution
- To: mathgroup at smc.vnet.net
- Subject: [mg42525] Re: WeibullDistribution
- From: Bill Rowe <listuser at earthlink.net>
- Date: Sat, 12 Jul 2003 05:19:33 -0400 (EDT)
- Sender: owner-wri-mathgroup at wolfram.com
On 7/11/03 at 2:57 AM, robert.nowak at ims.co.at (Robert Nowak) wrote:
> "usualy" you dont have randomly noised PDF(x)-funktionsvalues at
> positions x. "usualy" you only have random values which are expected
> to obey a distribution with a specific PDF.
Exactly right.
> in the "usual" case you therefore cant fit your data against the PDF.
Not exactly true but there are problems with fitting against the PDF. That is why I suggested fitting the empirical cummulative hazard function for the data to the expected cummulative hazard function for the Weibull distribution.
For any distribution, the cummulative hazard function is -Log[1-CDF]. For a set of random data points, aa reasonable estimator of the empirical CDF at data point x_j is given by (j - 0.5)/n where j is the rank of the sorted data and n is the number of data samples. Or coded into Mathematica, this would be H = Transpose[{Sort@x, -Log[(Range[Length at x]-.5)/Length@x]}] where x is the vector of data values.
Now for a Weibull distribution the cummulative hazard function, H =  (x/b)^a. Taking the logartihm of both sides yields.
Log[H] = a Log[x] - a Log[b] were a and b are the desired parameters of the Weibull distribution. So, setting H equal to the known empirical cummulative hazard function and doing a linear regression analysis of Log[H] vs Log[x] will give you the desired parameters, i.e.,
f = Fit[Log@H, {1,t},t];
a = -f[[2,1]]
b = Exp[-f[[1]]/f[[2,1]]]
However, since the PDF is the derivative of the CDF, it is possible to estimate the PDF from data set and fit it to the theorectical PDF. The key problem with this approach is it accenuates errors in the data and generally the interval between subsequent data points is too large to get an accurate estimate of the derivative. Since the empirical CDF is effectively a summing, uncertainty in each data point tends to be supressed.
> i think you have to do some of bob hanlons or similar calculations.
What Bob Hanlon suggested was computing the desired parameters from the first two moments in the data set. This is a point estimate of the parameters and is a valid approach. However, it isn't be most robust approach. A point estimate based on two chosen quantiles is more robust and for the Wiebull distribution, has a closed form solution.

