Re: Superimposing Normal on a Histogram of data
- To: mathgroup at smc.vnet.net
- Subject: [mg91580] Re: Superimposing Normal on a Histogram of data
- From: Bill Rowe <readnews at sbcglobal.net>
- Date: Thu, 28 Aug 2008 03:16:57 -0400 (EDT)
On 8/27/08 at 6:43 AM, desmier.pe at forces.gc.ca (ouadad) wrote: >Can someone point me to an algorithm that allows me to plot a normal >curve over a histogram of residuals? I just want to show how close >my residual distribution approximates a normal distribution. Here is a way to do what you asked << Histograms` data = RandomReal[NormalDistribution[0, 1], {1000}]; Show[Histogram[data], Plot[210 PDF[NormalDistribution[0, 1], x], {x, -3, 3}]] There are a couple of issues with this method. First, I found the scaling factor needed to make the vertical height of the pdf about the same as the histogram by trial and error. Not too difficult, but not attractive if this is to be used in some automated routine. Second, the appearance of any histogram depends on the choice for the bin width. By using a reasonably large sample, I avoided having to fiddle with this parameter to get a match. For smaller data sets, this will be much more of a problem. But both of these problems are easily avoided by comparing the cumulative distribution function to the experimentally observed distribution. For the data above this can be done graphically as: Show[ListPlot[Transpose@{Sort@data, (Range@1000 - .5)/1000}, Joined -> True], Plot[CDF[NormalDistribution[0, 1], x], {x, Min@data, Max@data}, PlotStyle -> Red]] Since the empirical distribution is simply the fraction of observations at or below a given value, there is no arbitrary bin size to pick. And both the cumulative distribution and empirical distribution are guaranteed to be monotonically increasing from 0 to 1 over the range of the data. Hence, there is no scaling factor. Finally, there are a number of simple statistics such maximum difference between the empirical and test distributions available to quantify how good of a match exists. While i understand histograms seem to have wider usage, for comparing data to a test distribution, a comparison of the data distribution to the cumulative distribution function for the test distribution is a superior way to make the comparison.