[Date Index]
[Thread Index]
[Author Index]
Re: Efficient Histogram Algorithm?
*To*: mathgroup at smc.vnet.net
*Subject*: [mg113083] Re: Efficient Histogram Algorithm?
*From*: Darren Glosemeyer <darreng at wolfram.com>
*Date*: Tue, 12 Oct 2010 13:49:42 -0400 (EDT)
On 10/12/2010 3:26 AM, Ray Koopman wrote:
> On Oct 11, 2:17 am, Julian Francis<julian.w.fran... at gmail.com> wrote:
>> Dear all,
>>
>> Does anyone know of a way to implement a fast histogram algorithm in
>> Mathematica?
>>
>> My first attempt (In44) was based around a table and using the count
>> function. A bit of thought indicates that this might not be very
>> efficient as it would seem to involve examining the list 255 times,
>> when only once is necessary. The second attempt involved using the
>> built in library function BinCounts, but surprisingly this was only
>> about twice as fast as my initial naive version, which makes me wonder
>> if it is as efficient as it could be. I can assume for my purposes
>> that the list will always have integer values between 0 and 255, and
>> that there are 256 bins.
>>
>> I wasn't able to think of a way to implement the histogram function in
>> a functional programming style that was also efficient, although it is
>> fairly easy to write in an imperative language. This may reflect the
>> vast majority of my programming experience has been in imperative
>> languages.
>>
>> I am using an old version of Mathematica (v4.1) (but on a modern PC);
>> but if anyone has dramatically different times for the BinCounts
>> function, then maybe it is time for me to upgrade.
>>
>> My motivation is a pattern recognition algorithm I am implementing,
>> but it requires a large number of histograms to be created, hence the
>> running time of an individual histogram is actually quite important.
>>
>> Thanks for any help,
>> Julian.
>>
>> In[44]:=
>> jhistogram[list_] := Table[Count[list, n], {n, 0, 255}]
>>
>> In[51]:=
>> \!\(Timing[\(randomList = Table[Random[Integer, 255], {10\^6}];\)]\)
>>
>> Out[51]=
>> {1.373 Second, Null}
>>
>> In[54]:=
>> Timing[hist = jhistogram[randomList];]
>>
>> Out[54]=
>> {2.34 Second, Null}
>>
>> In[56]:=
>> << Statistics`
>>
>> In[57]:=
>> Timing[hist = BinCounts[randomList, {0, 255}];]
>>
>> Out[57]=
>> {0.936 Second, Null}
> Here's a compiled routine that's twice as fast as BinCounts. Also,
> note that you need to give BinCounts a non-intuitive set of bins.
>
> In[1]:=<< Statistics`
>
> In[2]:= jhistogram[list_] := Table[Count[list,n],{n,0,255}]
>
> In[3]:= freqs = Compile[{{data,_Integer,1}},
> Module[{bins = Table[0,{256}]},
> Scan[bins[[#]]++&,data+1]; bins]];
>
> In[4]:= Timing@Length[randomList = Table[Random[Integer,255],{10^6}]]
> Timing@Length[hist1 = jhistogram[randomList]]
> Timing@Length[hist2 = BinCounts[randomList,{-1,255}]]
> Timing@Length[hist3 = freqs[randomList]]
> SameQ[hist1,hist2,hist3]
>
> Out[4]= {1.26 Second,1000000}
> Out[5]= {2.5 Second,256}
> Out[6]= {0.9 Second,256}
> Out[7]= {0.45 Second,256}
> Out[8]= True
>
BinCounts is really a function for continuous values. Given that we know
the values of interest are integers from 0 to 255, this could be made
even faster by using Tally (added in version 6).
In[1]:= jhistogram[list_] := Table[Count[list, n], {n, 0, 255}]
In[2]:= freqs = Compile[{{data, _Integer, 1}},
Module[{bins = Table[0, {256}]}, Scan[bins[[#]]++ &, data + 1];
bins]];
In[3]:= Timing@Length[randomList = Table[Random[Integer, 255], {10^6}]]
Out[3]= {0.391, 1000000}
In[4]:= Timing@Length[hist1 = jhistogram[randomList]]
Out[4]= {0.531, 256}
In[5]:= tallybased = With[{tallies = Tally[randomList]},
Split[
Sort[Join[tallies,
Transpose[{Range[0, 255],
ConstantArray[0,
256]}]]], #[[1]] == #2[[1]] &][[All, -1, -1]]]; //
Timing
-17
Out[5]= {1.40946 10 , Null}
In[6]:= SameQ[hist1, tallybased]
Out[6]= True
Tally gives a list of {value, count} pairs for the elements in the list.
The code above works by getting the tallies for the values in list,
adding elements of the form {i, 0} (this is done so we can make sure a 0
count is present if the value is not in the list) for i from 0 to 255,
sorting so like i values are next to each other and the 0 count will
come first, splitting based on the i value, and picking out the count
from the last {i,count} pair in the group for each i so we only get the
actual counts for the list and not any additional padding 0s.
Darren Glosemeyer
Wolfram Research
Prev by Date:
**Re: Generation of polynomials**
Next by Date:
**Version Advisory -- Can I automate?**
Previous by thread:
**Re: Efficient Histogram Algorithm?**
Next by thread:
**Re: Efficient Histogram Algorithm?**
| |