Tally timing puzzle
- To: mathgroup at smc.vnet.net
- Subject: [mg80522] Tally timing puzzle
- From: Mark Fisher <particlefilter at gmail.com>
- Date: Fri, 24 Aug 2007 01:59:57 -0400 (EDT)
Hi all, I don't understand why the difference in timings using Tally is so large: data1 = RandomInteger[{1, 100}, 2*10^6]; data2 = Partition[data1, 1]; Timing[Tally[data1];][[1]] --> 0.031 Timing[Tally[data2];][[1]] --> 9.75 That's a factor of 300 slow down. Sorting the data takes the same amount of time in both cases: Timing[data1sorted = Sort[data1];][[1]] --> 0.75 Timing[data2sorted = Sort[data2];][[1]] --> 0.75 But the effect of sorting on the timings is different: Timing[Tally[data1sorted];][[1]] --> 0.031 Timing[Tally[data2sorted];][[1]] --> 3.297 The total time of sorting and tallying data2 is much less than just tallying data2. For comparison, consider Timing[{First[#], Length[#]} & /@ Split[data1sorted];][[1]] --> 0.25 Timing[{First[#], Length[#]} & /@ Split[data2sorted];][[1]] --> 0.719 Any ideas as to why Tally is so slow on data2? --Mark