       Re: Select, from table data

• To: mathgroup at smc.vnet.net
• Subject: [mg112992] Re: Select, from table data
• From: Ray Koopman <koopman at sfu.ca>
• Date: Sun, 10 Oct 2010 06:41:09 -0400 (EDT)
• References: <i8pgih\$fjn\$1@smc.vnet.net>

On Oct 9, 3:36 am, Kurtis <djr... at gmail.com> wrote:
> Easy fix here.
>
> I'm trying to drop outliers from 2D data originally collected in the
> time domain.  My method is to analyze the 2nd derivative of smoothed
> data and drop those deviating more than, say 3x the standard deviation
> from the mean. I need to make a new table of the smoothed data with
> these outliers dropped.  I'm attempting the Select function in the
> last line, i'm just not familiar enough with the syntax for analyzing
> the criteria in the 2nd column (y values) from the dataset.  Thanks!
>
> SetDirectory[NotebookDirectory[]];
> set = Import ["testMELT2.xls", {"Data", 1}];
> << Smooth` (* savitzsky-golay - like filter *)
>
> t = Table[set[[n]][], {n, 1, Length[set]}];
> a260 = Table[set[[n]][], {n, 1, Length[set]}];
> Dataset1 = Table[{t[[n]], a260[[n]]}, {n, 1, Length[set]}];
>
> SmoothDatasetDERIV = Smooth[Dataset1, 13, 3, 2]; (* external package
> filtering over 13 points, polynomial order 3, f'')
> meanSECderiv = Mean[SmoothDatasetDERIV];
> meanY = meanSECderiv[]  (* pulling out Y values *)
> sd = StandardDeviation[SmoothDatasetDERIV];
> sdY = sd[]  (* pulling out Y values *)
>
> revision =
>  Select[{SmoothDatasetDERIV}, Function[Abs[meanY - # <= (3*sdY) &]]]
>
> Any help would be great, thanks!

It's usually not a good idea to use the mean and s.d. to
identify outliers, because both the mean and (especially)
the s.d. are themselves distorted by outliers. It's usually
better to define outliers in terms of the quartiles. See
http://en.wikipedia.org/wiki/Box_plot  and
http://reference.wolfram.com/mathematica/StatisticalPlots/tutorial/StatisticalPlots.html

The selection would go something like this:

{q1, q2, q3} = Quartiles @ SmoothDatasetDERIV[[All,2]];
m = << some value between 1.5 and 3 >>;
{lower, upper} = {q1-#, q3+#}&[m*(q3-q1)];
revision =
Select[SmoothDatasetDERIV, lower <= #[] <= upper &]]]

• Prev by Date: Re: Select, from table data
• Next by Date: Re: Astronomical Data Accuracy
• Previous by thread: Re: Select, from table data
• Next by thread: Re: How to run Mathematica nb file in command line in windows?