Re: Efficient Histogram Algorithm?
- To: mathgroup at smc.vnet.net
- Subject: [mg113039] Re: Efficient Histogram Algorithm?
- From: Bob Hanlon <hanlonr at cox.net>
- Date: Tue, 12 Oct 2010 04:23:50 -0400 (EDT)
Tally is much faster but was introduced in Version 6. jhistogram[list_] := Table[Count[list, n], {n, 0, 255}] jhistogram3[list_] := Sort[Tally[list]][[All, 2]] randomList = RandomInteger[{0, 255}, 10^6]; Timing[hist1 = jhistogram[randomList];][[1]] 0.360395 For BinCounts you need to use 256 as the upper bound Timing[hist2 = BinCounts[randomList, {0, 256}];][[1]] 0.278348 Timing[hist3 = jhistogram3[randomList];][[1]] 0.003623 hist1 == hist2 == hist3 True Total[hist1] == Total[hist2] == Total[hist3] == 10^6 True Bob Hanlon ---- Julian Francis <julian.w.francis at gmail.com> wrote: ============= Dear all, Does anyone know of a way to implement a fast histogram algorithm in Mathematica? My first attempt (In44) was based around a table and using the count function. A bit of thought indicates that this might not be very efficient as it would seem to involve examining the list 255 times, when only once is necessary. The second attempt involved using the built in library function BinCounts, but surprisingly this was only about twice as fast as my initial naive version, which makes me wonder if it is as efficient as it could be. I can assume for my purposes that the list will always have integer values between 0 and 255, and that there are 256 bins. I wasn't able to think of a way to implement the histogram function in a functional programming style that was also efficient, although it is fairly easy to write in an imperative language. This may reflect the vast majority of my programming experience has been in imperative languages. I am using an old version of Mathematica (v4.1) (but on a modern PC); but if anyone has dramatically different times for the BinCounts function, then maybe it is time for me to upgrade. My motivation is a pattern recognition algorithm I am implementing, but it requires a large number of histograms to be created, hence the running time of an individual histogram is actually quite important. Thanks for any help, Julian. In[44]:= jhistogram[list_] := Table[Count[list, n], {n, 0, 255}] In[51]:= \!\(Timing[\(randomList = Table[Random[Integer, 255], {10\^6}];\)]\) Out[51]= {1.373 Second, Null} In[54]:= Timing[hist = jhistogram[randomList];] Out[54]= {2.34 Second, Null} In[56]:= << Statistics` In[57]:= Timing[hist = BinCounts[randomList, {0, 255}];] Out[57]= {0.936 Second, Null}