Speed challenge integer frequencies: and the winner is...
- To: mathgroup at smc.vnet.net
- Subject: [mg67802] Speed challenge integer frequencies: and the winner is...
- From: Gareth Russell <russell at njit.edu>
- Date: Sat, 8 Jul 2006 04:56:11 -0400 (EDT)
- Sender: owner-wri-mathgroup at wolfram.com
Thanks for everyone who responded to the speed challenge. You can see
the submitted solutions in the responses to my original post. My code
(for finding a set of integer frequencies from a list of non-negative
integers, including zero frequency counts for integers not in the list
up to maxInt) was as follows:
myFrequencies=Compile[{{myList,_Integer,1},{
maxInt,_Integer}},Table[Count[myList,i],{i,0,maxInt}]]
Other solutions were interesting, including one that made use of
SparseArray, and two that used Split[Sort[]]. All were at lesat twice
as fast as my code, when maxInt was set to 500. But the fastest set
were those that followed my suggestion of scanning the list once and
incrementing values in a frequency array initialized with zeros. I had
been unable to program this efficiently, and I got at least three
similar versions that were efficent, clocking in a six times faster
than myFrequencies. The winner by a hair, submitted by Ray Koopman, was
freq8 = Compile[{{myList, _Integer, 1}, {maxInt, _Integer}},
Module[{bins = Table[0, {maxInt + 1}]},
Scan[bins[[#]]++ &, myList + 1]; bins]]
But note that Ray admitted making a small modification to a submission
by Carl Woll! The three variations on this theme varied in using Scan,
Map and Table, which seem to account for the small differences in
speed, with Scan being fractionally more efficient.
A confession: in the application I wanted this for, maxInt = 5. In that
case, as a number of people pointed out, myFrequencies is almost an
order of magnitude faster than any other code. I determined that
equality between myFrequencies and freq8 was achieved somewhere around
maxInt = 73. This is strange to me, as it would seem that (compiled)
Count is scanning the list 74 separate times, and yet doing so in the
same amount of time as the single compiled scan of freq8. Count must be
super-efficient!
Gareth
--
Gareth Russell
NJIT