MathGroup Archive: July 2003 [00214]

[Date Index] [Thread Index] [Author Index]

Re: WeibullDistribution

To: mathgroup at smc.vnet.net
Subject: [mg42532] Re: WeibullDistribution
From: bobhanlon at aol.com (Bob Hanlon)
Date: Sat, 12 Jul 2003 20:53:12 -0400 (EDT)
References: <beolj4$t25$1@smc.vnet.net>
Sender: owner-wri-mathgroup at wolfram.com

I did not suggest computing the desired parameters from the first two moments
of the data set.  I used maximum likelihood estimation.  Since the likelihood
function or log likelihood function are long expresssions for large data sets,
it is desirable to start the numerical process as close to the final answer as
possible.  The first two moments were used to obtain the initial estimate.

Bob Hanlon

In article <beolj4$t25$1 at smc.vnet.net>, Bill Rowe <listuser at earthlink.net>
wrote:

<< 
Subject:	Re: WeibullDistribution
From:		Bill Rowe <listuser at earthlink.net>
To: mathgroup at smc.vnet.net
Date:		Sat, 12 Jul 2003 09:49:24 +0000 (UTC)

On 7/11/03 at 2:57 AM, robert.nowak at ims.co.at (Robert Nowak) wrote:

> "usualy" you dont have randomly noised PDF(x)-funktionsvalues at
> positions x. "usualy" you only have random values which are expected
> to obey a distribution with a specific PDF.

Exactly right.

> in the "usual" case you therefore cant fit your data against the PDF.

Not exactly true but there are problems with fitting against the PDF. That is
why I suggested fitting the empirical cummulative hazard function for the data
to the expected cummulative hazard function for the Weibull distribution.

For any distribution, the cummulative hazard function is -Log[1-CDF]. For a set
of random data points, aa reasonable estimator of the empirical CDF at data
point x_j is given by (j - 0.5)/n where j is the rank of the sorted data and n
is the number of data samples. Or coded into Mathematica, this would be H =
Transpose[{Sort@x, -Log[(Range[Length at x]-.5)/Length@x]}] where x is the vector
of data values.

Now for a Weibull distribution the cummulative hazard function, H =  (x/b)^a.
Taking the logartihm of both sides yields.

Log[H] = a Log[x] - a Log[b] were a and b are the desired parameters of the
Weibull distribution. So, setting H equal to the known empirical cummulative
hazard function and doing a linear regression analysis of Log[H] vs Log[x] will
give you the desired parameters, i.e.,

f = Fit[Log@H, {1,t},t];
a = -f[[2,1]]
b = Exp[-f[[1]]/f[[2,1]]]

However, since the PDF is the derivative of the CDF, it is possible to estimate
the PDF from data set and fit it to the theorectical PDF. The key problem with
this approach is it accenuates errors in the data and generally the interval
between subsequent data points is too large to get an accurate estimate of the
derivative. Since the empirical CDF is effectively a summing, uncertainty in
each data point tends to be supressed.

> i think you have to do some of bob hanlons or similar calculations.

What Bob Hanlon suggested was computing the desired parameters from the first
two moments in the data set. This is a point estimate of the parameters and is
a valid approach. However, it isn't be most robust approach. A point estimate
based on two chosen quantiles is more robust and for the Wiebull distribution,
has a closed form solution.

 >><BR><BR>

Prev by Date: Re: Speed improvements in Mathematica 5 ??

Next by Date: Re: New version, new bugs

Previous by thread: Re: WeibullDistribution

Next by thread: Re: WeibullDistribution