Re: Re: Histogram normalization

  To: mathgroup at
  Subject: [mg40052] Re: [mg40012] Re: [mg40005] Histogram normalization
  From: Dr Bob <drbob at>
  Date: Mon, 17 Mar 2003 03:35:28 -0500 (EST)
  • References: <>
  • Reply-to: drbob at
  • Sender: owner-wri-mathgroup at

OK, but that nonparametric density gives a poor fit to the data, even with 
1000 samples.

Using MathStatica (see ), the 
following usually does a good job:

<< mathStatica.m
<< Graphics`Graphics`
f = 1/(E^(x^2/2)*Sqrt[2*Pi]); domain[f] = {x, -Infinity, Infinity};

Attributes[empiricalAnalysis] = {HoldFirst};
Options[empiricalAnalysis] = {bandwidthFactor -> 1, binFactor -> 0.15, \
minBins -> 25, maxBins -> 50};
empiricalAnalysis[f_, data_, options___Rule] :=
       Module[{c, p, empiricalData, empiricalPDF, factor, categories},
    {factor, categories} = {bandwidthFactor, Max[maxBins, Min[minBins, 
binFactor*Length@data]]} /. {options} /. Options@empiricalAnalysis;
    c = Bandwidth[data, f, Method -> SheatherJones];
    p = Block[{$DisplayFunction = Identity}, NPKDEPlot[data, f, c*factor]];
    empiricalData = First@Cases[p, Line[a_] -> a, Infinity];
    empiricalPDF = Interpolation[empiricalData];
    DisplayTogether[Histogram[data, HistogramScale -> 1, 
HistogramCategories \
-> categories], p, Plot[empiricalPDF[x], {x, Min@data, Max@data}]];

<< "Statistics`ContinuousDistributions`"
Y = RandomArray[StudentTDistribution[4], 500];
empiricalAnalysis[f, Y];

That plots the empirical PDF along with the scaled histogram on the same 
plot, and returns the empirical PDF as an InterpolatingFunction.  
bindwidthFactor and/or binFactor can be decreased to capture or display 
more "features" of the data.

Sometimes there's a numerical error in the BandWidth function (which causes 
some MathStatica function or another to be evaluated in uncompiled form).  
Hence the Off and On statements.

If anyone wants to see example plots without buying MathStatica, contact me 
and I'll send you a notebook.


On Sun, 16 Mar 2003 02:20:18 -0500 (EST), Kyriakos Chourdakis 
<tuxedomoon at> wrote:

> The ``smoothed histogram'' you are looking for is
> replicated by the nonparametric density. I think you
> might find the NonParametricDensity function below
> useful. It is a very simplified version  without
> control over the kernels etc. but it should do the
> trick.
> The code defines the nonparametric density, creates a
> 500 point sample from a t_4 distribution, and plots
> the ``smooth histogram'' of the sample together with
> the t_4.
> (***************************************************)
> (* Copy into .nb                                   *)
> Quit[]; << "Statistics`ContinuousDistributions`"
> NonParametricDensity[x_] := Module[{sx, g, gg, T, h}, sx = 
> StandardDeviation[x]; T = Length[x]; h = (sx*1.06)/T^0.2; g = 
> Function[{u}, (1*Plus @@ (Exp[-((u - #1)^2/(2*h^2))] & ) /@
> x)/
> (T*h*Sqrt[2*Pi])]; FunctionInterpolation[g[u],
> {u, Min[x] - 4*h, Max[x] + 4*h}]];
> dist = StudentTDistribution[4]; Y = RandomArray[dist, 500]; NPf = 
> NonParametricDensity[Y];
> Plot[{PDF[dist, x], NPf[x]}, {x, -5, 5}, Frame ->
> True, Axes -> False, PlotStyle -> {Thickness[0.], Thickness[0.01]}]; 
> (***************************************************)
> Kyriakos
> Kyriakos Chourdakis
> __________________________________________________
majort at
Bobby R. Treat

