Re: Select, from table data
- To: mathgroup at smc.vnet.net
- Subject: [mg112992] Re: Select, from table data
- From: Ray Koopman <koopman at sfu.ca>
- Date: Sun, 10 Oct 2010 06:41:09 -0400 (EDT)
- References: <i8pgih$fjn$1@smc.vnet.net>
On Oct 9, 3:36 am, Kurtis <djr... at gmail.com> wrote: > Easy fix here. > > I'm trying to drop outliers from 2D data originally collected in the > time domain. My method is to analyze the 2nd derivative of smoothed > data and drop those deviating more than, say 3x the standard deviation > from the mean. I need to make a new table of the smoothed data with > these outliers dropped. I'm attempting the Select function in the > last line, i'm just not familiar enough with the syntax for analyzing > the criteria in the 2nd column (y values) from the dataset. Thanks! > > SetDirectory[NotebookDirectory[]]; > set = Import ["testMELT2.xls", {"Data", 1}]; > << Smooth` (* savitzsky-golay - like filter *) > > t = Table[set[[n]][[1]], {n, 1, Length[set]}]; > a260 = Table[set[[n]][[2]], {n, 1, Length[set]}]; > Dataset1 = Table[{t[[n]], a260[[n]]}, {n, 1, Length[set]}]; > > SmoothDatasetDERIV = Smooth[Dataset1, 13, 3, 2]; (* external package > filtering over 13 points, polynomial order 3, f'') > meanSECderiv = Mean[SmoothDatasetDERIV]; > meanY = meanSECderiv[[2]] (* pulling out Y values *) > sd = StandardDeviation[SmoothDatasetDERIV]; > sdY = sd[[2]] (* pulling out Y values *) > > revision = > Select[{SmoothDatasetDERIV}, Function[Abs[meanY - # <= (3*sdY) &]]] > > Any help would be great, thanks! It's usually not a good idea to use the mean and s.d. to identify outliers, because both the mean and (especially) the s.d. are themselves distorted by outliers. It's usually better to define outliers in terms of the quartiles. See http://en.wikipedia.org/wiki/Box_plot and http://reference.wolfram.com/mathematica/StatisticalPlots/tutorial/StatisticalPlots.html The selection would go something like this: {q1, q2, q3} = Quartiles @ SmoothDatasetDERIV[[All,2]]; m = << some value between 1.5 and 3 >>; {lower, upper} = {q1-#, q3+#}&[m*(q3-q1)]; revision = Select[SmoothDatasetDERIV, lower <= #[[2]] <= upper &]]]