Re: Removing Outliers from List
- To: mathgroup at smc.vnet.net
- Subject: [mg131302] Re: Removing Outliers from List
- From: Bob Hanlon <hanlonr357 at gmail.com>
- Date: Tue, 25 Jun 2013 21:14:48 -0400 (EDT)
- Delivered-to: l-mathgroup@mail-archive0.wolfram.com
- Delivered-to: l-mathgroup@wolfram.com
- Delivered-to: mathgroup-outx@smc.vnet.net
- Delivered-to: mathgroup-newsendx@smc.vnet.net
- References: <20130625065756.7341A6A46@smc.vnet.net>
Part 1 Highlight the _: and type F1, Then read documentation for Optional Clear[f] f[x_: 2.09] := x f[] 2.09 f[1] 1 f[x] x Clear[f] f[x : _?NumericQ : 2.09] := x f[] 2.09 f[1] 1 f[x] f[x] Part 2 With[{ m = RandomReal[{-5, 5}], s = RandomReal[{.5, 1.5}]}, Print[{m, s}]; data = RandomVariate[NormalDistribution[m, s], 1000]]; {2.69346, 1.19773} Through[{Mean, StandardDeviation}[data]] {2.70281, 1.21353} You did not define removeNormalOutliers but I am guessing it is something like: removeNormalOutliers[data_, devFromMean_] := Module[ {m = Mean[data], s = StandardDeviation[data]}, Select[data, Abs[# - m]/s <= devFromMean &]] data2 = removeNormalOutliers[data, 1]; Length[data2] 681 dist = NormalDistribution[]; CDF[dist, 1.] - CDF[dist, -1.] 0.682689 data3 = removeNormalOutliers[data, 2]; Length[data3] 951 CDF[dist, 2.] - CDF[dist, -2.] 0.9545 Clear[dist, m, s]; dist[n_?Positive, m_: 0, s_: 1] := TruncatedDistribution[{m - n*s, m + n*s}, NormalDistribution[m, s]]; Mean[dist[1, m, s]] m Simplify[StandardDeviation[dist[1, m, s]], s > 0] s*Sqrt[1 - Sqrt[2/(E*Pi)]/Erf[1/Sqrt[2]]] % // N 0.53956 s PDF[dist[1, m, s], x] Piecewise[{{1/(E^((-m + x)^2/(2*s^2))* (Sqrt[2*Pi]*s*((1/2)*Erfc[-(1/Sqrt[2])] - (1/2)*Erfc[1/Sqrt[2]]))), Inequality[m - s, Less, x, LessEqual, m + s]}}, 0] CDF[dist[1, m, s], x] Piecewise[{{0, x <= m - s}, {((-(1/2))*Erfc[1/Sqrt[2]] + (1/2)*Erfc[(m - x)/(Sqrt[2]*s)])/ ((1/2)*Erfc[-(1/Sqrt[2])] - (1/2)*Erfc[1/Sqrt[2]]), Inequality[m - s, Less, x, LessEqual, m + s]}}, 1] Mean[dist[2, m, s]] m Simplify[StandardDeviation[dist[2, m, s]], s > 0] (s*Sqrt[E^2 - (2*Sqrt[2/Pi])/Erf[Sqrt[2]]])/E % // N 0.879626 s PDF[dist[2, m, s], x] Piecewise[{{1/(E^((-m + x)^2/(2*s^2))* (Sqrt[2*Pi]*s*((1/2)*Erfc[-Sqrt[2]] - Erfc[Sqrt[2]]/2))), Inequality[m - 2*s, Less, x, LessEqual, m + 2*s]}}, 0] CDF[dist[2, m, s], x] Piecewise[{{0, x <= m - 2*s}, {((-(1/2))*Erfc[Sqrt[2]] + (1/2)*Erfc[(m - x)/(Sqrt[2]*s)])/ ((1/2)*Erfc[-Sqrt[2]] - Erfc[Sqrt[2]]/2), Inequality[m - 2*s, Less, x, LessEqual, m + 2*s]}}, 1] Bob Hanlon On Tue, Jun 25, 2013 at 2:57 AM, Mariano <m.pierantozzi at univpm.it> wrote: > It's very very interesting for me and I want to ask you two trivial > questions for you , but not for me : > 1. what does it means \[Alpha]_:2.09, if then I set Alpha=1 > 2. If I run removeNormalOutliers[l, 1], I'm removing the outliers out 1 > standard deviation from the mean? What is the probability? And if I put 2 > what happen? > Thanks in advance. > >