Speed challenge integer frequencies: and the winner is...
- To: mathgroup at smc.vnet.net
- Subject: [mg67802] Speed challenge integer frequencies: and the winner is...
- From: Gareth Russell <russell at njit.edu>
- Date: Sat, 8 Jul 2006 04:56:11 -0400 (EDT)
- Sender: owner-wri-mathgroup at wolfram.com
Thanks for everyone who responded to the speed challenge. You can see the submitted solutions in the responses to my original post. My code (for finding a set of integer frequencies from a list of non-negative integers, including zero frequency counts for integers not in the list up to maxInt) was as follows: myFrequencies=Compile[{{myList,_Integer,1},{ maxInt,_Integer}},Table[Count[myList,i],{i,0,maxInt}]] Other solutions were interesting, including one that made use of SparseArray, and two that used Split[Sort[]]. All were at lesat twice as fast as my code, when maxInt was set to 500. But the fastest set were those that followed my suggestion of scanning the list once and incrementing values in a frequency array initialized with zeros. I had been unable to program this efficiently, and I got at least three similar versions that were efficent, clocking in a six times faster than myFrequencies. The winner by a hair, submitted by Ray Koopman, was freq8 = Compile[{{myList, _Integer, 1}, {maxInt, _Integer}}, Module[{bins = Table[0, {maxInt + 1}]}, Scan[bins[[#]]++ &, myList + 1]; bins]] But note that Ray admitted making a small modification to a submission by Carl Woll! The three variations on this theme varied in using Scan, Map and Table, which seem to account for the small differences in speed, with Scan being fractionally more efficient. A confession: in the application I wanted this for, maxInt = 5. In that case, as a number of people pointed out, myFrequencies is almost an order of magnitude faster than any other code. I determined that equality between myFrequencies and freq8 was achieved somewhere around maxInt = 73. This is strange to me, as it would seem that (compiled) Count is scanning the list 74 separate times, and yet doing so in the same amount of time as the single compiled scan of freq8. Count must be super-efficient! Gareth -- Gareth Russell NJIT