MathGroup Archive: February 2010 [00500]

[Date Index] [Thread Index] [Author Index]

Re: Re: Normality test

To: mathgroup at smc.vnet.net
Subject: [mg107505] Re: [mg107204] Re: Normality test
From: Ray Koopman <koopman at sfu.ca>
Date: Mon, 15 Feb 2010 05:46:38 -0500 (EST)

----- michael partensky <partensky at gmail.com> wrote:
> Hi. Ron.
> I have applied both the original and the modified functions (see
> below) to the data set
> dt = {1.2, 1.4, 1.9, 3.1, 3.3, 3.6, 3.8, 4.2, 4.4, 6.1};
> 
> The plots are somewhat different. Could you please comment on these
> differences. Especially, why changing the AspectRatio is important?

I think what you're noticing is mostly the effects of scaling and the
way the line is drawn. (The changes to the normal scores are small
and were a self-indulgence: I wanted to see if I could find a better
approximation of the expected order statistics without complicating
things unduly.)

The old version made the plot square, and drew the line through the
joint mean with slope (numeric, not visual) equal to the standard
deviation of the observed data, thus making it approximately equal to
the ratio of the two standard deviations. The line is close to the
best-fit line that would be produced by orthogonal regression, where
the errors are measured perpendicular to the line; and this will be
apparent in the plot to the extent that the ratio of the ranges of
the two variables is close to the ratio of their standard deviations.

The new version takes the same basic approach but applies it to the
middle half of each data set, using the first and third quartiles to
both draw the line and equate the visual plot units. Then the aspect
ratio of the whole plot becomes a function of the data and conveys
information about the lengths of the distributions' tails, even if
the line is not drawn.

Drawing the line through the first and third quartiles seems to be
the de facto standard, but it will not always be the best choice.
In your case it uses points 3 and 8, when it is clear from the plot
that points 4...9 collectively would be more appropriate. In this
case the line misleads the eye.

Here's another approach that you might find interesting: plot the
observed data against multiple samples from a standard normal
distribution.

Show[Graphics@Table[Line@Transpose@{Sort@RandomReal[
NormalDistribution[0,1], Length@data], data}, {50}], 
PlotRange->All, Frame->True]

Then compare that plot to what you get when you replace the data by
a single fixed sample of the same size from a normal distribution.

> 
> Which *quantitative* measure of the normality do you prefer?

I've never had need of such a measure, so I haven't thought about it.
My top-of-the-head response is that there probably is no measure that
will be best for all purposes, that it will depend on the particular
aspect of non-normality that is most important in the situation at
hand.

> 
> Thanks.
> Michael.
> 
> On Fri, Feb 5, 2010 at 3:24 AM, Ray Koopman <koopman at sfu.ca> wrote:
> 
>> Here, prompted by off-line conversations, is an improved version of
>> qqnorm:
>> 
>> qqnorm2[data_] := Block[{n, y,y1,y2,y3, x,x1,x2,x3, b,a},
>> n = Length@data; y = Sort@data; {y1,y2,y3} = Quartiles@y;
>> x = InverseErf[Range[1-n,n-1,2]/(n+.33(n-1.25)^-.1)]*Sqrt[2.];
>> {x1,x2,x3} = Quartiles@x; b = (y3-y1)/(x3-x1); a = y1 - b*x1;
>> ListPlot[Transpose@{x,y}, PlotRange->All, Frame->True, Axes->None,
>> AspectRatio->((Last@y-First@y)/(y3-y1))/((Last@x-First@x)/(x3-x1)),
>> Prolog->Line[{#,#*b+a}&/@{First@x,Last@x}],
>> FrameLabel->{"Standard Normal","Observed Data"}]]
>> 
>> The most notable changes are that the reference line is now drawn so
>> that it passes through the joint first and third quartile points, and
>> the aspect ratio now varies so that the visual slope of the reference
>> line is always approximately 1. Also, the normal scores are now a
>> better approximation of the expected order statstics.
>>
>> On Feb 2, 3:48 am, Ray Koopman <koop... at sfu.ca> wrote:
>>> On Feb 2, 12:28 am, michael partensky <parten... at gmail.com> wrote:
>>>> Hi.
>>>> I wonder if anybody knows a function similar to qqnorm(data) from
>>>> *R*, producing a normal scores plot, or some related tools in M.
>>>> for testing normality of data?
>>>>
>>>> Thanks
>>>> Michael Partenskii
>>>
>>> qqnorm[y_] := Block[
>>>  {n = Length@y, m = Mean@y, s = StandardDeviation@y, x},
>>>  x = InverseErf[Range[1-n,n-1,2]/n]*Sqrt[2.];
>>>  ListPlot[Transpose@{x,Sort@y},
>>>  PlotRange->All, Frame->True, Axes->None, AspectRatio->1,
>>>  Prolog->Line[{{x[[1]],x[[1]]*s+m},{x[[-1]],x[[-1]]*s+m}}],
>>>  FrameLabel->{"Theoretical Standard Normal Quantiles",
>>>               "Observed Quantiles"}]]

Prev by Date: Re: Shadow error when trying to use ParallelTable

Next by Date: Re: (-1)^(1/2.) on Mathematica!

Previous by thread: Re: Re: Normality test

Next by thread: Re: can SendMail use HTML and embedded images