Re: Speed Up of Calculations on Large Lists
- To: mathgroup at smc.vnet.net
- Subject: [mg108939] Re: Speed Up of Calculations on Large Lists
- From: sheaven <karg.stefan at googlemail.com>
- Date: Wed, 7 Apr 2010 07:26:47 -0400 (EDT)
- References: <hpcjh0$mva$1@smc.vnet.net>
Thanks everyone for the input! Ray is absolutely correct (Compile does not speed up the calculation based on my function movAverageOwn2FC). I used Compile with a function similar to variation#2 from Zach with Partition where Compile resulted in a speed up of c. 50%. Therefore, I thought that Compile is always faster. Obviously, this is a mistake :-) I did some testing and here is the outcome: 1. Moving average functions 1.1 Built-in Mathematica function In[1]:= maMathematica = Compile[{{inputData, _Real, 1}, {start, _Integer}, {end, _Integer}, {incr, _Integer}}, Module[{data, size, i}, size = Length[inputData]; Transpose[ Join[{inputData}, PadRight[MovingAverage[inputData, #], size] & /@ Table[x, {x, start, end, incr}]]] ] ] 1.2 Function based on Span In[2]:= maSpanF[dataInput_, days_, length_] := N[Mean[dataInput[[1 + # ;; days + #]]]] & /@ Range[0, length - days, 1] In[3]:= maSpan[dataInput_, start_, end_, incr_] := Module[{length}, length = Length[dataInput]; Transpose[ Join[{dataInput}, PadRight[maSpanF[dataInput, #, length], length] & /@ Range[start, end, incr]]] ] 1.3 Function based on Convolve In[4]:= maConvolveF[data_, windowLen_] := Module[{ker = Table[1, {windowLen}]/windowLen}, ListConvolve[ker, data]] In[5]:= maConvolve = Compile[{{dataInput, _Real, 1}, {start, _Integer}, {end, _Integer}, {incr, _Integer}}, Module[{length}, length = Length[dataInput]; Transpose[ Join[{dataInput}, PadRight[maConvolveF[dataInput, #], length] & /@ Range[start, end, incr]]] ] ] 1.4 Function based on Drop In[6]:= maDropF = Function[{vData, days}, With[{vAcc = Prepend[Accumulate@vData, 0.]}, Developer`ToPackedArray[(Drop[vAcc, days] - Drop[vAcc, -days])/ days, Real]]]; In[7]:= maDrop[dataInput_, start_, end_, incr_] := Module[{length}, length = Length[dataInput]; Transpose[ Join[{dataInput}, PadRight[maDropF[dataInput, #], length] & /@ Range[start, end, incr]]] ] 2. Dataset data = 100 + Accumulate[RandomReal[{-1, 1}, {10000}]]; 3. Test for equality In[15]:= maMathematica[data, 30, 250, 10] == maConvolve[data, 30, 250, 10] == maSpan[data, 30, 250, 10] == maDrop[data, 30, 250, 10] Out[15]= False This is very strange to me. Equality is only given based on Precision of 8: In[16]:= SetPrecision[maMathematica[data, 30, 250, 10], 8] == SetPrecision[maConvolve[data, 30, 250, 10], 8] == SetPrecision[maSpan[data, 30, 250, 10], 8] == SetPrecision[maDrop[data, 30, 250, 10], 8] Out[16]= True Any ideas why this is happening? Numbers of the different functions start to differ at the 10th decimal. My understanding was that Mathematica was only testing equality up to the precision it knows to be correct!? 4. Performance In[17]:= AbsoluteTiming[ Table[maMathematica[data, 30, 250, 10], {n, 1, 10, 1}];] Out[17]= {0.8230823, Null} In[18]:= AbsoluteTiming[ Table[maSpan[data, 30, 250, 10], {n, 1, 10, 1}];] Out[18]= {6.0676067, Null} In[21]:= AbsoluteTiming[ Table[maDrop[data, 30, 250, 10], {n, 1, 10, 1}];] Out[21]= {0.4480448, Null} In[22]:= AbsoluteTiming[ Table[maConvolve[data, 30, 250, 10], {n, 1, 10, 1}];] Out[22]= {0.7780778, Null} In essence: Span is by far the slowest (12 times slower than Drop) Built-in function is slower than functions based on Convolve and Drop (2 times slower than Drop) Convolve is slightly faster than the built-in function but is slower than Drop (2 times slower than Drop) Drop is by far the fastest function >From a performance standpoint, it seems necessary to think a lot about different algorithms for doing the same thing=85 Is there any good reference you can recommend giving some best practices for dealing with large lists? For those being interest I also prepared functions for the coefficient of variation (CV) based on the variance of the sample (many thanks to Raffy for his proposal). The impact of the algorithm on the performance of the function is more or less the same compared to the moving average. cvDropF = Function[{vData, days}, With[{v1 = Prepend[Accumulate[vData], 0.], v2 = Prepend[Accumulate[vData^2], 0.]}, Developer`ToPackedArray[ Sqrt[(Drop[v2, days] - Drop[v2, -days])/(days - 1) - ((Drop[v1, days] - Drop[v1, -days])/ days)^2*(days/(days - 1))]/(Sqrt[days]* maDropF[vData, days]), Real]]]; cvDrop[dataInput_, start_, end_, incr_] := Module[{length}, length = Length[dataInput]; Transpose[ Join[{dataInput}, PadRight[cvDropF[dataInput, #], length] & /@ Range[start, end, incr]]] ] cvSpanF[dataInput_, days_] := Module[{}, Sqrt[Variance[dataInput[[1 + # ;; days + #]]] & /@ Range[0, Length[dataInput] - days, 1]]/(Sqrt[days]* maSpanF[dataInput, days, Length[dataInput]]) ] cvSpan[dataInput_, start_, end_, incr_] := Module[{length}, length = Length[dataInput]; Transpose[ Join[{dataInput}, PadRight[cvSpanF[dataInput, #], length] & /@ Range[start, end, incr]]] ] cvConvolveF[data_, days_] := Module[{ker1 = Table[1, {days}]/(days - 1), ker2 = Table[1, {days}]/(days), conv1, conv2}, conv1 = ListConvolve[ker1, data^2]; conv2 = (ListConvolve[ker2, data]^2)*(days/(days - 1)); Sqrt[conv1 - conv2]/(Sqrt[days]*maConvolveF[data, days]) ] cvConvolve = Compile[{{dataInput, _Real, 1}, {start, _Integer}, {end, _Integer}, {incr, _Integer}}, Module[{length}, length = Length[dataInput]; Transpose[ Join[{dataInput}, PadRight[cvConvolveF[dataInput, #], length] & /@ Range[start, end, incr]]] ] ] @Raffy: I think there is an error in your moving average function. It should read like this (only for a single period): mv = Function[{vData, days}, With[ {v1 = Prepend[Accumulate[vData], 0.], v2 = Prepend[Accumulate[vData^2], 0.]}, Developer`ToPackedArray[ Sqrt[(Drop[v2, days] - Drop[v2, -days])/(days - 1) - ((Drop[v1, days] - Drop[v1, -days])/ days)^2*(days/(days - 1))] , Real]]];