MathGroup Archive 2010

[Date Index] [Thread Index] [Author Index]

Search the Archive

Speed Up of Calculations on Large Lists

  • To: mathgroup at smc.vnet.net
  • Subject: [mg108799] Speed Up of Calculations on Large Lists
  • From: sheaven <sheaven at gmx.de>
  • Date: Thu, 1 Apr 2010 05:59:22 -0500 (EST)

Hello everyone!

I am new to Mathematica and try get a understanding of its power. I
plan to use Mathematica mainly for financial data analysis (large
lists...).

Currently, I am trying to optimize calculation time for calculations
based on some sample data. I started with with a moving average of
share prices, because Mathematica already has a built in moving
average function for benchmarking.

I know that the built-in functions are always more efficient than any
user built function. Unfortunately, I have to create functions not
built in (e.g. something like "moving variance") in the future.

I have tried numerous ways to calc the moving average as efficiently
as possible. So far, I found that a function based on Span (or
List[[x;;y]]) is most efficient. Below are my test results.
Unfortunately, my UDF is still more than 5x slower than the built in
function.

Do you have any ideas to further speed up the function. I am already
using Compile and Parallelize.


This is what I got so far:


1. Functions for moving average:

1.1. Moving average based on built in function:

(*Function calcs moving average based on built in function for
specified number of days, e.g. 30 days to 250 days in steps of 10*)
movAverageC = Compile[{{inputData, _Real, 1}, {start, _Integer}, {end,
_Integer}, {incr, _Integer}}, Module[{data, size, i},
   size = Length[inputData];
   Transpose[Join[{inputData}, PadRight[MovingAverage[inputData, #],
size] & /@ Table[x, {x, start, end, incr}]]]
   ]
  ]

1.2. User defined function based on Span:
(*UDF for moving average based on Span*)
movAverageOwn2FC = Compile[{{dataInput, _Real, 1}, {days, _Integer},
{length, _Integer}},
  N[Mean[dataInput[[1 + # ;; days + #]]]] & /@ Range[0, length - days,
1]
]

(*Function calcs moving average based on UDF "movAverageOwn2FC" for
specified number of days, e.g. 30 days to 250 days in steps of 10*)
movAverageOwn2C = Compile[{{dataInput, _Real, 1}, {start, _Integer},
{end, _Integer}, {incr, _Integer}}, Module[{length},
   length = Length[dataInput];
   Transpose[Join[{dataInput}, PadRight[movAverageOwn2FC[dataInput, #,
length], length] & /@ Range[start, end, incr]]]
   ]
  ]


2. Create sample data:
data = 100 + # & /@ Accumulate[RandomReal[{-1, 1}, {10000}]];


3. Test if functions yield same results:
Test1 = movAverageC[data, 30, 250, 10]; (*Moving average for 30 days
to 250 days in steps of 10*)

Test2 = movAverageOwn2C[data, 30, 250, 10]; (*Moving average for 30
days to 250 days in steps of 10*)

Test1 == Test2
Out = True


4. Performance testing (Singe Core):
AbsoluteTiming[Table[movAverageC[data, 30, 250, 10], {n, 1, 20, 1}];]
(*Repeat function 20x for testing purposes*)
Out = {1.3030000, Null}

AbsoluteTiming[Table[movAverageOwn2C[data, 30, 250, 10], {n, 1, 20,
1}];] (*Repeat function 20x for testing purposes*)
Out = {11.4260000, Null}

=> Result UDF 9x slower


5. Performance testing (multi core):
LaunchKernels[]

Out = {KernelObject[1, "local"], KernelObject[2, "local"]}

DistributeDefinitions[data, movAverageOwn2C, movAverageOwn2FC,
movAverageC]

AbsoluteTiming[Parallelize[Table[movAverageC[data, 30, 250, 10], {n,
1, 20, 1}]];]
Out = {1.3200000, Null}

AbsoluteTiming[Parallelize[Table[movAverageOwn2C[data, 30, 250, 10],
{n, 1, 20, 1}]];]
Out = {6.7170000, Null}

=> Result UDF 5x slower
Very strange that the built in function does not get faster with
Parallelize


I would very much appreciate any input on how to decrease calculation
time based on the user defined function.

Many thanks
Stefan


  • Prev by Date: Re: Adding data to excel sheet (v7.0)
  • Next by Date: Intel MKL 10
  • Previous by thread: Re: Adding data to excel sheet (v7.0)
  • Next by thread: Re: Speed Up of Calculations on Large Lists