Histograms do not concur; why?

• To: mathgroup at smc.vnet.net
• Subject: [mg50535] Histograms do not concur; why?
• From: gilmar.rodriguez at nwfwmd.state.fl.us (Gilmar Rodr?guez Pierluissi)
• Date: Thu, 9 Sep 2004 05:18:29 -0400 (EDT)
• Sender: owner-wri-mathgroup at wolfram.com

```Dear Mathematica User Group:

I'm attempting to visualize the statistical distribution of
a particular data set but,I'm getting unconcurring vistas.

The data set in question is generated as follows:

In[1]: << Graphics`Graphics`;
In[2]: <<DiscreteMath`Combinatorica`;
In[3]: Off[General::spell1]

The following program gives the Minimal Goldbach Prime
Partition Point(p,q) corresponding to an even value n.
The program rotates the point (p,q)clockwise, by an angle
of Pi/4 radians about the origin, and returns the value
"rotated q".  ROTMGPPP is an abbreviation for "Rotated
Minimal Goldbach Prime Partition Point".

In[4]: ROTMGPPP = Compile[{{n, _Integer}}, Block[{rotq},
{Do[If[PrimeQ[n - (p = Prime[i])], Return[p]],
{i, PrimePi[n/2], PrimePi[Ceiling[Sqrt[n]]],-1}],
{rotq = 0.707107(n - 2p)}};
Return[rotq]], {{ i, _Integer},
{Prime[_], _Integer},{PrimePi[_], _Integer},
{PrimeQ[_], True | False}}];

Next; we produce the first million rotated q's.
(Mathematica v 5.0 takes slightly less than an hour to calculate
these values on my PC.  The computing time might be different
on your computer.) :

In[5]: A = Table[RPTMGPPP[n],{n, 4, 10^6, 2}];

Plot the set A:

In[6]: Plt1=ListPlot[A, PlotStyle ->Hue[0.4],
{PlotRange ->All,ImageSize -> 500]

Many of the rotated q's are zero in value, so we proceed to
isolate the non-zero rotated q's as follows:

In[7]: B = Select[A, # != 0, &];

We suspect that the non-zero rotated q's might have a Log-Normal
Distribution. This means that if we take the (Natural) logarithm
of the non-zero rotated q's; and look at their statistical
distribution; this distribution might be Normal; i.e.,
Gaussian, or "bell shaped":

In[8]: data = N[Log[B]];

We plot this data set first, (before producing the corresponding
Histogram):

In[9]: Plt2 = ListPlot[data, PlotStyle -> Hue[0.6],
PlotRange -> {{0,500000}, {0,8}},
ImageSize -> 500]

Here is the Bin and Frequency table corresponding to our data set.
(Please, inspect the table values to get a feel of what the
histogram might look like.) :

In[10]: MapIndexed[{Sequence @@ #2, Length[#1]} &, Split[Sort[data]]]
//TableForm

Next, we build our first histogram of the data set:

In[11]: Plt3 = Show[Histogram[data, HistogramCategories -> Range[8],
Ticks -> {Transpose[{Range[7] + .5, Range[7]}],
Automatic},DisplayFunction -> Identity],
PlotRange -> {{1, 8}, All},
AxesOrigin -> {1, 0}, Frame -> True,
DisplayFunction -> \$DisplayFunction]

Indeed, the histogram suggests that our data have a log-normal distribution.
Looking cautiously for a second opinion though, we do the following:

In[12]: <<Statistics`NormalDistribution`;

In[13]: Plt4 = Histogram[data]

Compare Plt3 and Plt4.  They are very different (to say the least).
(Number Theorists in the house are welcomed to explain the nature of the
distribution depicted in Plt4).

The following plots are also fascinating.  It seems that our data belongs
to two different populations (or partitions) but what are them? :

In[14]: freq = MapIndexed[{Sequence @@ #2, Length[#1]} &, Split[Sort[data]]];

In[15]: <<Graphics`Graphics`

In[16]: Plt5 = LogLinearListPlot[freq, PlotRange -> All,ImageSize->500]

In[17]: Plt6 = LogLogListPlot[freq, PlotRange->All,ImageSize->500]

Comments about the (technical) differences between Plt3 and Plt4, as well as
the Number Theory or statistical nature of our data set are welcomed!

You can also down load a notebook containing the above input lines, by
double-clicking the following shortcut:

I also built an Excel spreadsheet plot using:

In[18]: SetDirectory["C:\Temporary"]

In[19]: Export["C:\\Temporary\\frequency.txt", freq, "Table"]

Open the file frequency.txt in Excel, and build a vertical bar chart of
column b.