MathGroup Archive 1999

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: quartiles

  • To: mathgroup at smc.vnet.net
  • Subject: [mg18264] Re: [mg18214] quartiles
  • From: BobHanlon at aol.com
  • Date: Thu, 24 Jun 1999 14:24:49 -0400
  • Sender: owner-wri-mathgroup at wolfram.com

Tom,

Needs["Statistics`DescriptiveStatistics`"];

salaries = Sort[
    {250000, 100000, 60000, 60000, 40000, 40000, 40000, 40000,
      25000, 20000, 20000, 20000, 18000, 16000, 16000}]

{16000, 16000, 18000, 20000, 20000, 20000, 25000, 40000, 40000, 40000, 40000, 
\
60000, 60000, 100000, 250000}

Quartiles[salaries]

{20000, 40000, 55000}

To see where these values come from it is useful to plot the CDF for salaries.

salaryDist /: CDF[salaryDist, x_] := 
    Length[Select[salaries, # <= x &]]/Length[salaries];

Looking at the detail around the first quartile, it is clear that the CDF has 
the value 
0.25 at 20000

Plot[CDF[salaryDist, x], {x, 15000, 25000}];

Likewise, for the second quartile (median), the CDF has the value 0.5 at 40000

Plot[CDF[salaryDist, x], {x, 35000, 45000}];

For the third quartile, the CDF has the value 0.75 over the range {40000, 
60000}

Plot[CDF[salaryDist, x], {x, 25000, 100000}];

?Quartiles

"Quartiles[list] gives a list of the interpolated .25, .50, and .75 quantiles 
\
of the entries in list."

Consequently, the third quartile must be interpolated

0.75*Length[salaries]

11.25

There are 11 values at or below 40000 so we need the equivalent of 1/4 of the 
next value

40000 + 60000/4

55000


Bob Hanlon

In a message dated 6/22/99 10:29:51 PM, tdevries at shop.westworld.ca writes:

>I am working with some statistics problems and have a question about finding
>quartiles for a set of data
>
>Load in the package
>
>
>Needs["Statistics`DescriptiveStatistics`"]
>
>
>create a set of data
>
>salaries =
>{250000,100000,60000,60000,40000,40000,40000,40000,25000,20000,20000,20000,
>  18000,16000,16000}
>
>ask for the quartiles
>
>Quartiles[salaries] 
>
>and this is the response
>{20000,40000,55000}
>
>At this point I am probably revealing my ignorance of statistics....  
>
>
>
>250000,100000,60000,60000,40000,40000, 40000
>40000,  Median
>25000,20000,20000,20000,18000,16000,16000
>
>The lower quartile is the median of the values below the median, which
>I get
>with Mathematica 
>25000,20000,20000,
>20000,
>18000,16000,16000
>
>The upper quartile should be the median of the numbers above the median,
>so
>why is it 55000?
>250000,100000,60000,
>60000,
>40000,40000, 40000
>
>Does Mathematica use some algorithm to get rid of outliers before finding
>quartiles,  or does it eliminate the median from the data set before finding
>the quartiles, .....?
>
>The set of data I used as an example was taken from the math text I am
>using
>and the answer the text supplies, and the answer I think I should get,
>is
>different from the one Mathematica gets.  I would appreciate any advice
>on this!
>


  • Prev by Date: Re: Some problems with complex functions like Sqrt[z]
  • Next by Date: O.D.E in Power Series
  • Previous by thread: Re: quartiles
  • Next by thread: Re: quartiles