Re: Removing Outliers from List

• To: mathgroup at smc.vnet.net
• Subject: [mg131302] Re: Removing Outliers from List
• From: Bob Hanlon <hanlonr357 at gmail.com>
• Date: Tue, 25 Jun 2013 21:14:48 -0400 (EDT)
• Delivered-to: l-mathgroup@mail-archive0.wolfram.com
• Delivered-to: l-mathgroup@wolfram.com
• Delivered-to: mathgroup-outx@smc.vnet.net
• Delivered-to: mathgroup-newsendx@smc.vnet.net
• References: <20130625065756.7341A6A46@smc.vnet.net>

```Part 1

Highlight the _: and type F1, Then read documentation for Optional

Clear[f]

f[x_: 2.09] := x

f[]

2.09

f[1]

1

f[x]

x

Clear[f]

f[x : _?NumericQ : 2.09] := x

f[]

2.09

f[1]

1

f[x]

f[x]

Part 2

With[{
m = RandomReal[{-5, 5}],
s = RandomReal[{.5, 1.5}]},
Print[{m, s}];
data = RandomVariate[NormalDistribution[m, s], 1000]];

{2.69346, 1.19773}

Through[{Mean, StandardDeviation}[data]]

{2.70281, 1.21353}

You did not define removeNormalOutliers but I am guessing it is something
like:

removeNormalOutliers[data_, devFromMean_] := Module[
{m = Mean[data], s = StandardDeviation[data]},
Select[data, Abs[# - m]/s <= devFromMean &]]

data2 = removeNormalOutliers[data, 1];

Length[data2]

681

dist = NormalDistribution[];

CDF[dist, 1.] - CDF[dist, -1.]

0.682689

data3 = removeNormalOutliers[data, 2];

Length[data3]

951

CDF[dist, 2.] - CDF[dist, -2.]

0.9545

Clear[dist, m, s];

dist[n_?Positive, m_: 0, s_: 1] :=
TruncatedDistribution[{m - n*s, m + n*s},
NormalDistribution[m, s]];

Mean[dist[1, m, s]]

m

Simplify[StandardDeviation[dist[1, m, s]], s > 0]

s*Sqrt[1 - Sqrt[2/(E*Pi)]/Erf[1/Sqrt[2]]]

% // N

0.53956 s

PDF[dist[1, m, s], x]

Piecewise[{{1/(E^((-m + x)^2/(2*s^2))*
(Sqrt[2*Pi]*s*((1/2)*Erfc[-(1/Sqrt[2])] -
(1/2)*Erfc[1/Sqrt[2]]))), Inequality[m - s,
Less, x, LessEqual, m + s]}}, 0]

CDF[dist[1, m, s], x]

Piecewise[{{0, x <= m - s},
{((-(1/2))*Erfc[1/Sqrt[2]] +
(1/2)*Erfc[(m - x)/(Sqrt[2]*s)])/
((1/2)*Erfc[-(1/Sqrt[2])] -
(1/2)*Erfc[1/Sqrt[2]]), Inequality[m - s,
Less, x, LessEqual, m + s]}}, 1]

Mean[dist[2, m, s]]

m

Simplify[StandardDeviation[dist[2, m, s]], s > 0]

(s*Sqrt[E^2 - (2*Sqrt[2/Pi])/Erf[Sqrt[2]]])/E

% // N

0.879626 s

PDF[dist[2, m, s], x]

Piecewise[{{1/(E^((-m + x)^2/(2*s^2))*
(Sqrt[2*Pi]*s*((1/2)*Erfc[-Sqrt[2]] -
Erfc[Sqrt[2]]/2))), Inequality[m - 2*s,
Less, x, LessEqual, m + 2*s]}}, 0]

CDF[dist[2, m, s], x]

Piecewise[{{0, x <= m - 2*s},
{((-(1/2))*Erfc[Sqrt[2]] +
(1/2)*Erfc[(m - x)/(Sqrt[2]*s)])/
((1/2)*Erfc[-Sqrt[2]] - Erfc[Sqrt[2]]/2),
Inequality[m - 2*s, Less, x, LessEqual,
m + 2*s]}}, 1]

Bob Hanlon

On Tue, Jun 25, 2013 at 2:57 AM, Mariano <m.pierantozzi at univpm.it> wrote:

> It's very very interesting for me and I want to ask you two trivial
> questions for you , but not for me :
> 1. what does it means \[Alpha]_:2.09, if then I set Alpha=1
> 2. If I run removeNormalOutliers[l, 1], I'm removing the outliers out 1
> standard deviation from the mean?  What is the probability? And if I put 2
> what happen?
> Thanks in advance.
>
>

```

• Prev by Date: What is f[1]? Advanced question
• Next by Date: Rotate Dateticks in DateListPlot
• Previous by thread: Re: Usage Messages in Mathematica
• Next by thread: Rotate Dateticks in DateListPlot