Re: Speed challenge: Improve on integer frequencies from Count?
- To: mathgroup at smc.vnet.net
- Subject: [mg67780] Re: Speed challenge: Improve on integer frequencies from Count?
- From: "James Gilmore" <james.gilmore at yale.edu>
- Date: Fri, 7 Jul 2006 07:13:04 -0400 (EDT)
- Organization: Yale University
- References: <e8ir3t$s8o$1@smc.vnet.net>
- Sender: owner-wri-mathgroup at wolfram.com
Hi Gareth, Yes this is an interesting problem. I have made an attempt and have a one-liner for you that may be useful: mynewFreq[myList_,maxInt_] := (Length[#1] &) /@ Split[Sort[Join[myList, Range[0, maxInt]]]] - Table[1, {maxInt + 1}] The idea here is that you first join the original list with a list of freq. from 0 to fmax. Then you already have all the required freq. in a list. A sort then orders the data from 0 to maxInt, then Split into indentical runs of elements and finally take the Length of the indentical runs. One then subtracts 1 from all elements from the list, to acount for the extra count from Range[0, maxInt]. I could not compile this function because Split changes the tensorial character of the list, and the compiled code gives an error. If the setting of maxInt is less than ~100, then your code which uses the Mathematica internal function Count will be slightly faster than mine. I assume that Count is optimized for the small integer case, since this is what most users would require of Count from a practical point of view. However if you want to perform the same operation with large lists, i.e. Length[]>~2000 and maxInt>~1000 then my function outperforms the Mathematica Count[] implementation, by a least a factor of ~6. My timing tests show that both implementations are however approximately O(n). Sort and Split appear to be the functions which slow my implementation. Comments and better versions are welcome! Code below. Changing: maxInt = 1000, in the timing loop should give you an idea of the performance characteristics for smaller and larger integer ranges, my implementation is in the red. Cheers James Gilmore ------------------------------------- In[6]:= myFrequencies = Compile[{{myList, _Integer, 1}, {maxInt, _Integer}}, Table[Count[myList, i], {i, 0, maxInt}]]; In[7]:= testList = Table[Random[Integer, {0, 500}], {1000}]; In[8]:= Timing[Do[myFrequencies[testList, 500], {1000}]] Out[8]= {1.781 Second, Null} In[9]:= mynewFreq[myList_, maxInt_] := (Length[#1] &) /@ Split[Sort[Join[myList, Range[0, maxInt]]]] - Table[1, {maxInt + 1}] In[10]:= Timing[Do[mynewFreq[testList, 500], {1000}]] Out[10]= {0.906 Second, Null} maxInt = 1000; timeloop = 100; timingdata = Table[ testList = Table[Random[Integer, {0, maxInt}], {fpts}]; {fpts, Timing[Do[myFrequencies[testList, maxInt], {timeloop}]][[1]]/Second , Timing[Do[mynewFreq[testList, maxInt], {timeloop}]][[1]]/Second} , {fpts, 1000, 20000, 1000}]; myFrequencies[testList, maxInt] == mynewFreq[testList, maxInt] Out[14]= True In[15]:= timingdata Out[15]= {{1000, 0.344, 0.14}, {2000, 0.703, 0.219}, {3000, 1.047, 0.203}, {4000, 1.437, 0.25}, {5000, 1.5, 0.328}, {6000, 2.125, 0.344}, {7000, 2.39, 0.375}, {8000, 2.688, 0.39}, {9000, 3.094, 0.437}, {10000, 3.594, 0.484}, {11000, 3.781, 0.532}, {12000, 4.172, 0.5}, {13000, 4.516, 0.609}, {14000, 4.609, 0.563}, {15000, 4.937, 0.719}, {16000, 5.359, 0.625}, {17000, 5.766, 0.672}, {18000, 6., 0.782}, {19000, 6.563, 0.672}, {20000, 6.656, 0.782}} In[16]:= ListPlot[({#1[[1]], #1[[2]]} & ) /@ timingdata, PlotJoined -> True, PlotStyle -> RGBColor[0, 0, 1], DisplayFunction -> Identity]; ListPlot[({#1[[1]], #1[[3]]} & ) /@ timingdata, PlotJoined -> True, PlotStyle -> RGBColor[1, 0, 0], DisplayFunction -> Identity]; Show[%, %%, DisplayFunction -> $DisplayFunction]; In[19]:= $Version Out[19]= 5.2 for Microsoft Windows (June 20, 2005) ----------------------------------------------------- "Gareth Russell" <russell at njit.edu> wrote in message news:e8ir3t$s8o$1 at smc.vnet.net... > Hi, > > A challenge for the efficiency gurus: > > Given a list of integers whose possible values lie in the range 0 to > maxInt, with any given integer represented 0, 1 or multiple times, > e.g., what you would get from > > Table[Random[Integer, {0, maxInt}], {1000}] > > create a list of frequencies of values including ALL possible values > (i.e., not the basic output of Frequencies[], because that only > includes values present in the list. > > The fastest I have been able to come up with is the obvious use of > Count[]... > > In[256]:= > myFrequencies=Compile[{{myList,_Integer,1},{ > maxInt,_Integer}},Table[Count[myList,i],{i,0,maxInt}]] > > Out[256]= > CompiledFunction[{myList,maxInt},Table[Count[myList,i],{i,0,maxInt}],-\ > CompiledCode-] > > In[278]:= > testList=Table[Random[Integer,{0,500}],{1000}]; > > In[281]:= > Timing[Do[myFrequencies[testList,500];,{1000}]] > > Out[281]= > {2.3103 Second,Null} > > However, it seems to me that this should be able to be improved upon, > because it scans the whole list maxInt times. One should be able to > scan the list once, and increment a frequency list in the right places, > perhaps with a For[] loop. But try as I might, I can't come up with a > version that is faster than the compiled Count[] version above. > > Can anyone do it? > > Gareth > > -- > Gareth Russell > NJIT >