Tally timing puzzle
- To: mathgroup at smc.vnet.net
- Subject: [mg80522] Tally timing puzzle
- From: Mark Fisher <particlefilter at gmail.com>
- Date: Fri, 24 Aug 2007 01:59:57 -0400 (EDT)
Hi all,
I don't understand why the difference in timings using Tally is so
large:
data1 = RandomInteger[{1, 100}, 2*10^6];
data2 = Partition[data1, 1];
Timing[Tally[data1];][[1]] --> 0.031
Timing[Tally[data2];][[1]] --> 9.75
That's a factor of 300 slow down. Sorting the data takes the same
amount of time in both cases:
Timing[data1sorted = Sort[data1];][[1]] --> 0.75
Timing[data2sorted = Sort[data2];][[1]] --> 0.75
But the effect of sorting on the timings is different:
Timing[Tally[data1sorted];][[1]] --> 0.031
Timing[Tally[data2sorted];][[1]] --> 3.297
The total time of sorting and tallying data2 is much less than just
tallying data2.
For comparison, consider
Timing[{First[#], Length[#]} & /@ Split[data1sorted];][[1]] --> 0.25
Timing[{First[#], Length[#]} & /@ Split[data2sorted];][[1]] --> 0.719
Any ideas as to why Tally is so slow on data2?
--Mark