Re: quartiles
- To: mathgroup at smc.vnet.net
- Subject: [mg18264] Re: [mg18214] quartiles
- From: BobHanlon at aol.com
- Date: Thu, 24 Jun 1999 14:24:49 -0400
- Sender: owner-wri-mathgroup at wolfram.com
Tom, Needs["Statistics`DescriptiveStatistics`"]; salaries = Sort[ {250000, 100000, 60000, 60000, 40000, 40000, 40000, 40000, 25000, 20000, 20000, 20000, 18000, 16000, 16000}] {16000, 16000, 18000, 20000, 20000, 20000, 25000, 40000, 40000, 40000, 40000, \ 60000, 60000, 100000, 250000} Quartiles[salaries] {20000, 40000, 55000} To see where these values come from it is useful to plot the CDF for salaries. salaryDist /: CDF[salaryDist, x_] := Length[Select[salaries, # <= x &]]/Length[salaries]; Looking at the detail around the first quartile, it is clear that the CDF has the value 0.25 at 20000 Plot[CDF[salaryDist, x], {x, 15000, 25000}]; Likewise, for the second quartile (median), the CDF has the value 0.5 at 40000 Plot[CDF[salaryDist, x], {x, 35000, 45000}]; For the third quartile, the CDF has the value 0.75 over the range {40000, 60000} Plot[CDF[salaryDist, x], {x, 25000, 100000}]; ?Quartiles "Quartiles[list] gives a list of the interpolated .25, .50, and .75 quantiles \ of the entries in list." Consequently, the third quartile must be interpolated 0.75*Length[salaries] 11.25 There are 11 values at or below 40000 so we need the equivalent of 1/4 of the next value 40000 + 60000/4 55000 Bob Hanlon In a message dated 6/22/99 10:29:51 PM, tdevries at shop.westworld.ca writes: >I am working with some statistics problems and have a question about finding >quartiles for a set of data > >Load in the package > > >Needs["Statistics`DescriptiveStatistics`"] > > >create a set of data > >salaries = >{250000,100000,60000,60000,40000,40000,40000,40000,25000,20000,20000,20000, > 18000,16000,16000} > >ask for the quartiles > >Quartiles[salaries] > >and this is the response >{20000,40000,55000} > >At this point I am probably revealing my ignorance of statistics.... > > > >250000,100000,60000,60000,40000,40000, 40000 >40000, Median >25000,20000,20000,20000,18000,16000,16000 > >The lower quartile is the median of the values below the median, which >I get >with Mathematica >25000,20000,20000, >20000, >18000,16000,16000 > >The upper quartile should be the median of the numbers above the median, >so >why is it 55000? >250000,100000,60000, >60000, >40000,40000, 40000 > >Does Mathematica use some algorithm to get rid of outliers before finding >quartiles, or does it eliminate the median from the data set before finding >the quartiles, .....? > >The set of data I used as an example was taken from the math text I am >using >and the answer the text supplies, and the answer I think I should get, >is >different from the one Mathematica gets. I would appreciate any advice >on this! >