MathGroup Archive 2013

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Removing Outliers from List

  • To: mathgroup at smc.vnet.net
  • Subject: [mg131302] Re: Removing Outliers from List
  • From: Bob Hanlon <hanlonr357 at gmail.com>
  • Date: Tue, 25 Jun 2013 21:14:48 -0400 (EDT)
  • Delivered-to: l-mathgroup@mail-archive0.wolfram.com
  • Delivered-to: l-mathgroup@wolfram.com
  • Delivered-to: mathgroup-outx@smc.vnet.net
  • Delivered-to: mathgroup-newsendx@smc.vnet.net
  • References: <20130625065756.7341A6A46@smc.vnet.net>

Part 1


Highlight the _: and type F1, Then read documentation for Optional


Clear[f]


f[x_: 2.09] := x


f[]


2.09


f[1]


1


f[x]


x


Clear[f]


f[x : _?NumericQ : 2.09] := x


f[]


2.09


f[1]


1


f[x]


f[x]



Part 2


With[{
   m = RandomReal[{-5, 5}],
   s = RandomReal[{.5, 1.5}]},
  Print[{m, s}];
  data = RandomVariate[NormalDistribution[m, s], 1000]];


{2.69346, 1.19773}


Through[{Mean, StandardDeviation}[data]]


{2.70281, 1.21353}


You did not define removeNormalOutliers but I am guessing it is something
like:


removeNormalOutliers[data_, devFromMean_] := Module[
  {m = Mean[data], s = StandardDeviation[data]},
  Select[data, Abs[# - m]/s <= devFromMean &]]


data2 = removeNormalOutliers[data, 1];


Length[data2]


681


dist = NormalDistribution[];


CDF[dist, 1.] - CDF[dist, -1.]


0.682689


data3 = removeNormalOutliers[data, 2];


Length[data3]


951


CDF[dist, 2.] - CDF[dist, -2.]


0.9545


Clear[dist, m, s];


dist[n_?Positive, m_: 0, s_: 1] :=
  TruncatedDistribution[{m - n*s, m + n*s},
   NormalDistribution[m, s]];



Mean[dist[1, m, s]]


m


Simplify[StandardDeviation[dist[1, m, s]], s > 0]


s*Sqrt[1 - Sqrt[2/(E*Pi)]/Erf[1/Sqrt[2]]]


% // N


0.53956 s



PDF[dist[1, m, s], x]


Piecewise[{{1/(E^((-m + x)^2/(2*s^2))*
            (Sqrt[2*Pi]*s*((1/2)*Erfc[-(1/Sqrt[2])] -
                  (1/2)*Erfc[1/Sqrt[2]]))), Inequality[m - s,
         Less, x, LessEqual, m + s]}}, 0]



CDF[dist[1, m, s], x]


Piecewise[{{0, x <= m - s},
     {((-(1/2))*Erfc[1/Sqrt[2]] +
            (1/2)*Erfc[(m - x)/(Sqrt[2]*s)])/
         ((1/2)*Erfc[-(1/Sqrt[2])] -
            (1/2)*Erfc[1/Sqrt[2]]), Inequality[m - s,
         Less, x, LessEqual, m + s]}}, 1]



Mean[dist[2, m, s]]


m



Simplify[StandardDeviation[dist[2, m, s]], s > 0]


(s*Sqrt[E^2 - (2*Sqrt[2/Pi])/Erf[Sqrt[2]]])/E


% // N


0.879626 s



PDF[dist[2, m, s], x]


Piecewise[{{1/(E^((-m + x)^2/(2*s^2))*
            (Sqrt[2*Pi]*s*((1/2)*Erfc[-Sqrt[2]] -
                  Erfc[Sqrt[2]]/2))), Inequality[m - 2*s,
         Less, x, LessEqual, m + 2*s]}}, 0]



CDF[dist[2, m, s], x]


Piecewise[{{0, x <= m - 2*s},
     {((-(1/2))*Erfc[Sqrt[2]] +
            (1/2)*Erfc[(m - x)/(Sqrt[2]*s)])/
         ((1/2)*Erfc[-Sqrt[2]] - Erfc[Sqrt[2]]/2),
       Inequality[m - 2*s, Less, x, LessEqual,
         m + 2*s]}}, 1]



Bob Hanlon




On Tue, Jun 25, 2013 at 2:57 AM, Mariano <m.pierantozzi at univpm.it> wrote:

> It's very very interesting for me and I want to ask you two trivial
> questions for you , but not for me :
> 1. what does it means \[Alpha]_:2.09, if then I set Alpha=1
> 2. If I run removeNormalOutliers[l, 1], I'm removing the outliers out 1
> standard deviation from the mean?  What is the probability? And if I put 2
> what happen?
> Thanks in advance.
>
>




  • Prev by Date: What is f[1]? Advanced question
  • Next by Date: Rotate Dateticks in DateListPlot
  • Previous by thread: Re: Usage Messages in Mathematica
  • Next by thread: Rotate Dateticks in DateListPlot