Re: quartiles
- To: mathgroup at smc.vnet.net
- Subject: [mg18264] Re: [mg18214] quartiles
- From: BobHanlon at aol.com
- Date: Thu, 24 Jun 1999 14:24:49 -0400
- Sender: owner-wri-mathgroup at wolfram.com
Tom,
Needs["Statistics`DescriptiveStatistics`"];
salaries = Sort[
{250000, 100000, 60000, 60000, 40000, 40000, 40000, 40000,
25000, 20000, 20000, 20000, 18000, 16000, 16000}]
{16000, 16000, 18000, 20000, 20000, 20000, 25000, 40000, 40000, 40000, 40000,
\
60000, 60000, 100000, 250000}
Quartiles[salaries]
{20000, 40000, 55000}
To see where these values come from it is useful to plot the CDF for salaries.
salaryDist /: CDF[salaryDist, x_] :=
Length[Select[salaries, # <= x &]]/Length[salaries];
Looking at the detail around the first quartile, it is clear that the CDF has
the value
0.25 at 20000
Plot[CDF[salaryDist, x], {x, 15000, 25000}];
Likewise, for the second quartile (median), the CDF has the value 0.5 at 40000
Plot[CDF[salaryDist, x], {x, 35000, 45000}];
For the third quartile, the CDF has the value 0.75 over the range {40000,
60000}
Plot[CDF[salaryDist, x], {x, 25000, 100000}];
?Quartiles
"Quartiles[list] gives a list of the interpolated .25, .50, and .75 quantiles
\
of the entries in list."
Consequently, the third quartile must be interpolated
0.75*Length[salaries]
11.25
There are 11 values at or below 40000 so we need the equivalent of 1/4 of the
next value
40000 + 60000/4
55000
Bob Hanlon
In a message dated 6/22/99 10:29:51 PM, tdevries at shop.westworld.ca writes:
>I am working with some statistics problems and have a question about finding
>quartiles for a set of data
>
>Load in the package
>
>
>Needs["Statistics`DescriptiveStatistics`"]
>
>
>create a set of data
>
>salaries =
>{250000,100000,60000,60000,40000,40000,40000,40000,25000,20000,20000,20000,
> 18000,16000,16000}
>
>ask for the quartiles
>
>Quartiles[salaries]
>
>and this is the response
>{20000,40000,55000}
>
>At this point I am probably revealing my ignorance of statistics....
>
>
>
>250000,100000,60000,60000,40000,40000, 40000
>40000, Median
>25000,20000,20000,20000,18000,16000,16000
>
>The lower quartile is the median of the values below the median, which
>I get
>with Mathematica
>25000,20000,20000,
>20000,
>18000,16000,16000
>
>The upper quartile should be the median of the numbers above the median,
>so
>why is it 55000?
>250000,100000,60000,
>60000,
>40000,40000, 40000
>
>Does Mathematica use some algorithm to get rid of outliers before finding
>quartiles, or does it eliminate the median from the data set before finding
>the quartiles, .....?
>
>The set of data I used as an example was taken from the math text I am
>using
>and the answer the text supplies, and the answer I think I should get,
>is
>different from the one Mathematica gets. I would appreciate any advice
>on this!
>