Re: Efficient Histogram Algorithm?
- To: mathgroup at smc.vnet.net
- Subject: [mg113053] Re: Efficient Histogram Algorithm?
- From: Ray Koopman <koopman at sfu.ca>
- Date: Tue, 12 Oct 2010 04:26:32 -0400 (EDT)
- References: <i8uknn$of3$1@smc.vnet.net>
On Oct 11, 2:17 am, Julian Francis <julian.w.fran... at gmail.com> wrote: > Dear all, > > Does anyone know of a way to implement a fast histogram algorithm in > Mathematica? > > My first attempt (In44) was based around a table and using the count > function. A bit of thought indicates that this might not be very > efficient as it would seem to involve examining the list 255 times, > when only once is necessary. The second attempt involved using the > built in library function BinCounts, but surprisingly this was only > about twice as fast as my initial naive version, which makes me wonder > if it is as efficient as it could be. I can assume for my purposes > that the list will always have integer values between 0 and 255, and > that there are 256 bins. > > I wasn't able to think of a way to implement the histogram function in > a functional programming style that was also efficient, although it is > fairly easy to write in an imperative language. This may reflect the > vast majority of my programming experience has been in imperative > languages. > > I am using an old version of Mathematica (v4.1) (but on a modern PC); > but if anyone has dramatically different times for the BinCounts > function, then maybe it is time for me to upgrade. > > My motivation is a pattern recognition algorithm I am implementing, > but it requires a large number of histograms to be created, hence the > running time of an individual histogram is actually quite important. > > Thanks for any help, > Julian. > > In[44]:= > jhistogram[list_] := Table[Count[list, n], {n, 0, 255}] > > In[51]:= > \!\(Timing[\(randomList = Table[Random[Integer, 255], {10\^6}];\)]\) > > Out[51]= > {1.373 Second, Null} > > In[54]:= > Timing[hist = jhistogram[randomList];] > > Out[54]= > {2.34 Second, Null} > > In[56]:= > << Statistics` > > In[57]:= > Timing[hist = BinCounts[randomList, {0, 255}];] > > Out[57]= > {0.936 Second, Null} Here's a compiled routine that's twice as fast as BinCounts. Also, note that you need to give BinCounts a non-intuitive set of bins. In[1]:= << Statistics` In[2]:= jhistogram[list_] := Table[Count[list,n],{n,0,255}] In[3]:= freqs = Compile[{{data,_Integer,1}}, Module[{bins = Table[0,{256}]}, Scan[bins[[#]]++&,data+1]; bins]]; In[4]:= Timing@Length[randomList = Table[Random[Integer,255],{10^6}]] Timing@Length[hist1 = jhistogram[randomList]] Timing@Length[hist2 = BinCounts[randomList,{-1,255}]] Timing@Length[hist3 = freqs[randomList]] SameQ[hist1,hist2,hist3] Out[4]= {1.26 Second,1000000} Out[5]= {2.5 Second,256} Out[6]= {0.9 Second,256} Out[7]= {0.45 Second,256} Out[8]= True