MathGroup Archive: August 2008 [00625]

[Date Index] [Thread Index] [Author Index]

Re: Superimposing Normal on a Histogram of data

To: mathgroup at smc.vnet.net
Subject: [mg91580] Re: Superimposing Normal on a Histogram of data
From: Bill Rowe <readnews at sbcglobal.net>
Date: Thu, 28 Aug 2008 03:16:57 -0400 (EDT)

On 8/27/08 at 6:43 AM, desmier.pe at forces.gc.ca (ouadad) wrote:

>Can someone point me to an algorithm that allows me to plot a normal
>curve over a histogram of residuals?  I just want to show how close
>my residual distribution approximates a normal distribution.

Here is a way to do what you asked

<< Histograms`

data = RandomReal[NormalDistribution[0, 1], {1000}];

Show[Histogram[data],
  Plot[210 PDF[NormalDistribution[0, 1], x], {x, -3, 3}]]

There are a couple of issues with this method. First, I found
the scaling factor needed to make the vertical height of the pdf
about the same as the histogram by trial and error. Not too
difficult, but not attractive if this is to be used in some
automated routine. Second, the appearance of any histogram
depends on the choice for the bin width. By using a reasonably
large sample, I avoided having to fiddle with this parameter to
get a match. For smaller data sets, this will be much more of a
problem. But both of these problems are easily avoided by
comparing the cumulative distribution function to the
experimentally observed distribution.

For the data above this can be done graphically as:

Show[ListPlot[Transpose@{Sort@data, (Range@1000 - .5)/1000},
   Joined -> True],
  Plot[CDF[NormalDistribution[0, 1], x], {x, Min@data, Max@data},
   PlotStyle -> Red]]

Since the empirical distribution is simply the fraction of
observations at or below a given value, there is no arbitrary
bin size to pick. And both the cumulative distribution and
empirical distribution are guaranteed to be monotonically
increasing from 0 to 1 over the range of the data. Hence, there
is no scaling factor.

Finally, there are a number of simple statistics such maximum
difference between the empirical and test distributions
available to quantify how good of a match exists.

While i understand histograms seem to have wider usage, for
comparing data to a test distribution, a comparison of the data
distribution to the cumulative distribution function for the
test distribution is a superior way to make the comparison.

Prev by Date: Re: problem with using external functions

Next by Date: Re: Mathematica and F#

Previous by thread: Re: Superimposing Normal on a Histogram of data

Next by thread: Re: Superimposing Normal on a Histogram of data