MathGroup Archive: April 2010 [00038]

[Date Index] [Thread Index] [Author Index]

Re: Speed Up of Calculations on Large Lists

To: mathgroup at smc.vnet.net
Subject: [mg108828] Re: Speed Up of Calculations on Large Lists
From: Bill Rowe <readnews at sbcglobal.net>
Date: Fri, 2 Apr 2010 05:20:31 -0500 (EST)

On 4/1/10 at 5:59 AM, sheaven at gmx.de (sheaven) wrote:

>I am new to Mathematica and try get a understanding of its power. I
>plan to use Mathematica mainly for financial data analysis (large
>lists...).

>Currently, I am trying to optimize calculation time for calculations
>based on some sample data. I started with with a moving average of
>share prices, because Mathematica already has a built in moving
>average function for benchmarking.

>I know that the built-in functions are always more efficient than
>any user built function. Unfortunately, I have to create functions
>not built in (e.g. something like "moving variance") in the future.

>I have tried numerous ways to calc the moving average as efficiently
>as possible. So far, I found that a function based on Span (or
>List[[x;;y]]) is most efficient. Below are my test results.
>Unfortunately, my UDF is still more than 5x slower than the built in
>function.

>Do you have any ideas to further speed up the function. I am already
>using Compile and Parallelize.

>This is what I got so far:

>1. Functions for moving average:

<function code snipped>

>2. Create sample data: data = 100 + # & /@
>Accumulate[RandomReal[{-1, 1}, {10000}]];

a side point here. The plus function works on lists. That is:

data = 100 + Accumulate[RandomReal[{-1,1}, 10000]];

will produce the same result as your code but be a bit faster.
Note, the difference in speed here will be quite small and is
clearly not the thrust of your message. But I point this out
since such small difference can add up to something significant
in more complex code.

>3. Test if functions yield same results: Test1 = movAverageC[data,
>30, 250, 10]; (*Moving average for 30 days to 250 days in steps of
>10*)

OK. Here is the timing results I get for you compiled code based
on Span

In[1]:= movAverageOwn2FC =
   Compile[{{dataInput, _Real,
      1}, {days, _Integer}, {length, _Integer}},
    N[Mean[dataInput[[1 + # ;; days + #]]]] & /@
     Range[0, length - days, 1]];

In[2]:= data = 100 + Accumulate[RandomReal[{-1, 1}, {10000}]];

In[3]:= Timing[Table[movAverageOwn2FC[data, 20, Length@data], {100}];]

Out[3]= {1.45855,Null}

Now here is a definition using ListConvolve

In[4]:= newMoveAverage[data_, windowLen_] :=
  Module[{ker = Table[1, {windowLen}]/windowLen},
   ListConvolve[ker, data]]

In[5]:= Timing[Table[newMoveAverage[data, 20], {100}];]

Out[5]= {0.103379,Null}

So, on my machine using a single core without Compile, using
ListConvolve improves the speed by more than 10X. Using both
parallel processing with both cores should improve this result
for very large data arrays. Note, ListConvolve is so fast, the
overhead of setting up parallel processes will probably degrade
times for small data arrays. I have not tested this to verify my
guess here.

Compile also might improve things somewhat. But this probably
won't be significant. Compile can offer significant improvement
in some code particularly when procedural programming is used.
But compile seldom offers improvement in code with one or two
function calls and no procedural structures such as For. In
fact, there are times when using Compile will actually degrade
the execution speed.

Finally, to demonstrate the code with ListConvolve does the same
as your code:

In[6]:= movAverageOwn2FC[data, 20, Length@data] ==
  newMoveAverage[data, 20]

Out[6]= True

Prev by Date: Re: A simple ParallelDo problem

Next by Date: Re: 0/1 knapsack-like minimalization problem and file

Previous by thread: Speed Up of Calculations on Large Lists

Next by thread: Re: Speed Up of Calculations on Large Lists