Re: question: fitting a distribution from quantiles
- To: mathgroup at smc.vnet.net
- Subject: [mg126471] Re: question: fitting a distribution from quantiles
- From: Darren Glosemeyer <darreng at wolfram.com>
- Date: Sat, 12 May 2012 04:57:38 -0400 (EDT)
- Delivered-to: l-mathgroup@mail-archive0.wolfram.com
- References: <201205110414.AAA23695@smc.vnet.net>
On 5/10/2012 11:14 PM, László Sándor wrote: > Hi all, > > I have a project (with Mathematica 8) where the first step would be to get the distribution describing my "data" which actually only have quantiles (or worse: frequencies for arbitrary bins). EstimatedDistribution[] looks promising, but I don't know how to feed in this kind of data. Please let me know if you know a fast way. > > Thank! > > There isn't enough information in your data for the types of estimation done by EstimatedDistribution. The type of information you have in your data would lend itself well to a least squares fit to the cdf of the distribution. As an example, let's take this data: In[1]:= data = BlockRandom[SeedRandom[1234]; RandomVariate[GammaDistribution[5, 8], 100]]; We can use Min and Max to see the range of values and then bin within that range to construct cutoff and frequency data. In[2]:= {Min[data], Max[data]} Out[2]= {13.7834, 112.429} Here, xvals are the cutoffs and counts are the bin frequencies. In[3]:= {xvals, counts} = HistogramList[data, {{0, 15, 20, 50, 100, 120}}] Out[3]= {{0, 15, 20, 50, 100, 120}, {1, 6, 55, 37, 1}} We can get the accumulated probabilities as follows. In[4]:= probs = Accumulate[counts]/Length[data] 1 7 31 99 Out[4]= {---, ---, --, ---, 1} 100 100 50 100 The analogue of your quantile values would be the right endpoints, Rest[xvals]. In[5]:= quantiles = Rest[xvals] Out[5]= {15, 20, 50, 100, 120} Now we can use the quantiles as the x values and the cdf values as the y values for a least squares fitting to the CDF (parameters may need starting values in general, but defaults worked fine in this case): In[6]:= FindFit[Transpose[{quantiles, probs}], CDF[GammaDistribution[a, b], x], {a, b}, x] Out[6]= {a -> 5.24009, b -> 8.88512} Given that we know that the data don't extend to the right limit of a gamma's support (gammas can be any positive values), we may want to adjust the cdf values a bit. The following will shift all the cdf values by 1/(2*numberOfDataPoints) in this particular case: In[7]:= FindFit[Transpose[{quantiles, probs - 1/(2 Length[data])}], CDF[GammaDistribution[a, b], x], {a, b}, x] Out[7]= {a -> 5.3696, b -> 8.73319} Darren Glosemeyer Wolfram Research
- References:
- question: fitting a distribution from quantiles
- From: László Sándor <sandorl@gmail.com>
- question: fitting a distribution from quantiles