MathGroup Archive 2006

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Speed challenge: Improve on integer frequencies from Count?

  • To: mathgroup at smc.vnet.net
  • Subject: [mg67780] Re: Speed challenge: Improve on integer frequencies from Count?
  • From: "James Gilmore" <james.gilmore at yale.edu>
  • Date: Fri, 7 Jul 2006 07:13:04 -0400 (EDT)
  • Organization: Yale University
  • References: <e8ir3t$s8o$1@smc.vnet.net>
  • Sender: owner-wri-mathgroup at wolfram.com

Hi Gareth,

Yes this is an interesting problem. I have made an attempt and have a 
one-liner for you that may be useful:
mynewFreq[myList_,maxInt_] := (Length[#1] &) /@
Split[Sort[Join[myList, Range[0, maxInt]]]] - Table[1, {maxInt + 1}]

The idea here is that you first join the original list with a list of freq. 
from 0 to fmax. Then you already have all the required freq. in a list. A 
sort then orders the data from 0 to maxInt, then Split into indentical runs 
of elements and finally take the Length of the indentical runs. One then 
subtracts 1 from all elements from the list, to acount for the extra count 
from Range[0, maxInt]. I could not compile this function because Split 
changes the tensorial character of the list, and the compiled code gives an 
error.

If the setting of maxInt is less than ~100, then your code which uses the 
Mathematica internal function Count will be slightly faster than mine.
I assume that Count is optimized for the small integer case, since this
is what most users would require of Count from a practical point of view.

However if you want to perform the same operation with large lists, i.e. 
Length[]>~2000 and maxInt>~1000 then my function outperforms the Mathematica
Count[] implementation, by a least a factor of ~6. My timing tests show
that both implementations are however approximately O(n).

Sort and Split appear to be the functions which slow my implementation. 
Comments and better versions are welcome!

Code below. Changing: maxInt = 1000, in the timing loop should give you an 
idea of the performance characteristics for smaller and larger integer 
ranges, my implementation is in the red.

Cheers
James Gilmore
-------------------------------------
In[6]:=
myFrequencies =
Compile[{{myList, _Integer, 1}, {maxInt, _Integer}},
Table[Count[myList, i], {i, 0, maxInt}]];
In[7]:=
testList = Table[Random[Integer, {0, 500}], {1000}];
In[8]:=
Timing[Do[myFrequencies[testList, 500], {1000}]]
Out[8]=
{1.781 Second, Null}
In[9]:=
mynewFreq[myList_,
maxInt_] := (Length[#1] &) /@
Split[Sort[Join[myList, Range[0, maxInt]]]] - Table[1, {maxInt + 1}]
In[10]:=
Timing[Do[mynewFreq[testList, 500], {1000}]]
Out[10]=
{0.906 Second, Null}
maxInt = 1000;
timeloop = 100;
timingdata = Table[
testList = Table[Random[Integer, {0, maxInt}], {fpts}];
{fpts,
Timing[Do[myFrequencies[testList, maxInt], {timeloop}]][[1]]/Second
, Timing[Do[mynewFreq[testList, maxInt], {timeloop}]][[1]]/Second}
, {fpts, 1000, 20000, 1000}];
myFrequencies[testList, maxInt] == mynewFreq[testList, maxInt]
Out[14]=
True
In[15]:=
timingdata
Out[15]=
{{1000, 0.344, 0.14}, {2000, 0.703, 0.219}, {3000, 1.047, 0.203}, {4000,
1.437, 0.25}, {5000, 1.5, 0.328}, {6000, 2.125, 0.344}, {7000, 2.39,
0.375}, {8000, 2.688, 0.39}, {9000, 3.094, 0.437}, {10000, 3.594,
0.484}, {11000, 3.781, 0.532}, {12000, 4.172, 0.5}, {13000, 4.516,
0.609}, {14000, 4.609, 0.563}, {15000, 4.937, 0.719}, {16000, 5.359,
0.625}, {17000, 5.766, 0.672}, {18000, 6., 0.782}, {19000, 6.563,
0.672}, {20000, 6.656, 0.782}}


In[16]:=
ListPlot[({#1[[1]], #1[[2]]} & ) /@ timingdata, PlotJoined -> True, 
PlotStyle -> RGBColor[0, 0, 1], DisplayFunction -> Identity];
ListPlot[({#1[[1]], #1[[3]]} & ) /@ timingdata, PlotJoined -> True, 
PlotStyle -> RGBColor[1, 0, 0], DisplayFunction -> Identity];
Show[%, %%, DisplayFunction -> $DisplayFunction];
In[19]:=
$Version
Out[19]=
5.2 for Microsoft Windows (June 20, 2005)

-----------------------------------------------------
"Gareth Russell" <russell at njit.edu> wrote in message 
news:e8ir3t$s8o$1 at smc.vnet.net...
> Hi,
>
> A challenge for the efficiency gurus:
>
> Given a list of integers whose possible values lie in the range 0 to
> maxInt, with any given integer represented 0, 1 or multiple times,
> e.g., what you would get from
>
> Table[Random[Integer, {0, maxInt}], {1000}]
>
> create a list of frequencies of values including ALL possible values
> (i.e., not the basic output of Frequencies[], because that only
> includes values present in the list.
>
> The fastest I have been able to come up with is the obvious use of 
> Count[]...
>
> In[256]:=
> myFrequencies=Compile[{{myList,_Integer,1},{
>    maxInt,_Integer}},Table[Count[myList,i],{i,0,maxInt}]]
>
> Out[256]=
> CompiledFunction[{myList,maxInt},Table[Count[myList,i],{i,0,maxInt}],-\
> CompiledCode-]
>
> In[278]:=
> testList=Table[Random[Integer,{0,500}],{1000}];
>
> In[281]:=
> Timing[Do[myFrequencies[testList,500];,{1000}]]
>
> Out[281]=
> {2.3103 Second,Null}
>
> However, it seems to me that this should be able to be improved upon,
> because it scans the whole list maxInt times. One should be able to
> scan the list once, and increment a frequency list in the right places,
> perhaps with a For[] loop. But try as I might, I can't come up with a
> version that is faster than the compiled Count[] version above.
>
> Can anyone do it?
>
> Gareth
>
> -- 
> Gareth Russell
> NJIT
> 



  • Prev by Date: Hamiltonian circuits
  • Next by Date: Re: Subvalues and Parameters in Differentiation and Usage Messages
  • Previous by thread: Re: Speed challenge: Improve on integer frequencies from Count?
  • Next by thread: Re: Speed challenge: Improve on integer frequencies from Count?