 
 
 
 
 
 
Re: Speed Up of Calculations on Large Lists
- To: mathgroup at smc.vnet.net
- Subject: [mg108872] Re: Speed Up of Calculations on Large Lists
- From: Raffy <adraffy at gmail.com>
- Date: Mon, 5 Apr 2010 08:00:57 -0400 (EDT)
- References: <hp1ua5$65k$1@smc.vnet.net> <hp9u4i$mcp$1@smc.vnet.net>
On Apr 4, 4:45 am, Ray Koopman <koop... at sfu.ca> wrote:
> Your compiled movAverageC takes 25% more time than the uncompiled
>
> movAv[data_, start_, end_, incr_] := Transpose@PadRight@Join[{data},
>       Table[MovingAverage[data, r], {r, start, end, incr}]]
>
> under your test conditions.
>
> On Apr 1, 3:59 am, sheaven <shea... at gmx.de> wrote:
>
>
>
> > Hello everyone!
>
> > I am new to Mathematica and try get a understanding of its power. I
> > plan to use Mathematica mainly for financial data analysis (large
> > lists...).
>
> > Currently, I am trying to optimize calculation time for calculations
> > based on some sample data. I started with with a moving average of
> > share prices, because Mathematica already has a built in moving
> > average function for benchmarking.
>
> > I know that the built-in functions are always more efficient than any
> > user built function. Unfortunately, I have to create functions not
> > built in (e.g. something like "moving variance") in the future.
>
> > I have tried numerous ways to calc the moving average as efficiently
> > as possible. So far, I found that a function based on Span (or
> > List[[x;;y]]) is most efficient. Below are my test results.
> > Unfortunately, my UDF is still more than 5x slower than the built in
> > function.
>
> > Do you have any ideas to further speed up the function. I am already
> > using Compile and Parallelize.
>
> > This is what I got so far:
>
> > 1. Functions for moving average:
>
> > 1.1. Moving average based on built in function:
>
> > (*Function calcs moving average based on built in function for
> > specified number of days, e.g. 30 days to 250 days in steps of 10*)
> > movAverageC = Compile[{{inputData, _Real, 1}, {start, _Integer}, {end=
,
> > _Integer}, {incr, _Integer}}, Module[{data, size, i},
> >    size = Length[inputData];
> >    Transpose[Join[{inputData}, PadRight[MovingAverage[inputData, #]=
,
> > size] & /@ Table[x, {x, start, end, incr}]]]
> >    ]
> >   ]
>
> > 1.2. User defined function based on Span:
> > (*UDF for moving average based on Span*)
> > movAverageOwn2FC = Compile[{{dataInput, _Real, 1}, {days, _Integer},
> > {length, _Integer}},
> >   N[Mean[dataInput[[1 + # ;; days + #]]]] & /@ Range[0, length - days=
,
> > 1]
> > ]
>
> > (*Function calcs moving average based on UDF "movAverageOwn2FC" for
> > specified number of days, e.g. 30 days to 250 days in steps of 10*)
> > movAverageOwn2C = Compile[{{dataInput, _Real, 1}, {start, _Integer},
> > {end, _Integer}, {incr, _Integer}}, Module[{length},
> >    length = Length[dataInput];
> >    Transpose[Join[{dataInput}, PadRight[movAverageOwn2FC[dataInput,=
 #,
> > length], length] & /@ Range[start, end, incr]]]
> >    ]
> >   ]
>
> > 2. Create sample data:
> > data = 100 + # & /@ Accumulate[RandomReal[{-1, 1}, {10000}]];
>
> > 3. Test if functions yield same results:
> > Test1 = movAverageC[data, 30, 250, 10]; (*Moving average for 30 days
> > to 250 days in steps of 10*)
>
> > Test2 = movAverageOwn2C[data, 30, 250, 10]; (*Moving average for 30
> > days to 250 days in steps of 10*)
>
> > Test1 == Test2
> > Out = True
>
> > 4. Performance testing (Singe Core):
> > AbsoluteTiming[Table[movAverageC[data, 30, 250, 10], {n, 1, 20, 1}];]
> > (*Repeat function 20x for testing purposes*)
> > Out = {1.3030000, Null}
>
> > AbsoluteTiming[Table[movAverageOwn2C[data, 30, 250, 10], {n, 1, 20,
> > 1}];] (*Repeat function 20x for testing purposes*)
> > Out = {11.4260000, Null}
>
> > => Result UDF 9x slower
>
> > 5. Performance testing (multi core):
> > LaunchKernels[]
>
> > Out = {KernelObject[1, "local"], KernelObject[2, "local"]}
>
> > DistributeDefinitions[data, movAverageOwn2C, movAverageOwn2FC,
> > movAverageC]
>
> > AbsoluteTiming[Parallelize[Table[movAverageC[data, 30, 250, 10], {n,
> > 1, 20, 1}]];]
> > Out = {1.3200000, Null}
>
> > AbsoluteTiming[Parallelize[Table[movAverageOwn2C[data, 30, 250, 10],
> > {n, 1, 20, 1}]];]
> > Out = {6.7170000, Null}
>
> > => Result UDF 5x slower
> > Very strange that the built in function does not get faster with
> > Parallelize
>
> > I would very much appreciate any input on how to decrease calculation
> > time based on the user defined function.
>
> > Many thanks
> > Stefan
ma = Function[{vData, vRange}, With[
    {vAcc =
      Prepend[Accumulate@Developer`ToPackedArray[vData, Real], 0.]},
    Transpose@
     Developer`ToPackedArray[
      Prepend[Table[
        PadRight[(Drop[vAcc, n] - Drop[vAcc, -n])/n, Length[vData],
         0.], {n, vRange}], vData], Real]
    ]];
ma[data, Range[30, 250, 10]]
This is a 4-5x speed up over movAverageC.
mv = Function[{vData, vRange}, With[
    {v1 = Prepend[Accumulate[vData], 0.],
     v2 = Prepend[Accumulate[vData^2], 0.]},
    Transpose@
     Developer`ToPackedArray[
      Prepend[Table[
        PadRight[(Drop[v2, n] - Drop[v2, -n])/
           n - ((Drop[v1, n] - Drop[v1, -n])/n)^2, Length[vData],
         0.], {n, vRange}], vData], Real]
    ]];
This would be a fast moving variance.

