Speed Up of Calculations on Large Lists
- To: mathgroup at smc.vnet.net
- Subject: [mg108799] Speed Up of Calculations on Large Lists
- From: sheaven <sheaven at gmx.de>
- Date: Thu, 1 Apr 2010 05:59:22 -0500 (EST)
Hello everyone! I am new to Mathematica and try get a understanding of its power. I plan to use Mathematica mainly for financial data analysis (large lists...). Currently, I am trying to optimize calculation time for calculations based on some sample data. I started with with a moving average of share prices, because Mathematica already has a built in moving average function for benchmarking. I know that the built-in functions are always more efficient than any user built function. Unfortunately, I have to create functions not built in (e.g. something like "moving variance") in the future. I have tried numerous ways to calc the moving average as efficiently as possible. So far, I found that a function based on Span (or List[[x;;y]]) is most efficient. Below are my test results. Unfortunately, my UDF is still more than 5x slower than the built in function. Do you have any ideas to further speed up the function. I am already using Compile and Parallelize. This is what I got so far: 1. Functions for moving average: 1.1. Moving average based on built in function: (*Function calcs moving average based on built in function for specified number of days, e.g. 30 days to 250 days in steps of 10*) movAverageC = Compile[{{inputData, _Real, 1}, {start, _Integer}, {end, _Integer}, {incr, _Integer}}, Module[{data, size, i}, size = Length[inputData]; Transpose[Join[{inputData}, PadRight[MovingAverage[inputData, #], size] & /@ Table[x, {x, start, end, incr}]]] ] ] 1.2. User defined function based on Span: (*UDF for moving average based on Span*) movAverageOwn2FC = Compile[{{dataInput, _Real, 1}, {days, _Integer}, {length, _Integer}}, N[Mean[dataInput[[1 + # ;; days + #]]]] & /@ Range[0, length - days, 1] ] (*Function calcs moving average based on UDF "movAverageOwn2FC" for specified number of days, e.g. 30 days to 250 days in steps of 10*) movAverageOwn2C = Compile[{{dataInput, _Real, 1}, {start, _Integer}, {end, _Integer}, {incr, _Integer}}, Module[{length}, length = Length[dataInput]; Transpose[Join[{dataInput}, PadRight[movAverageOwn2FC[dataInput, #, length], length] & /@ Range[start, end, incr]]] ] ] 2. Create sample data: data = 100 + # & /@ Accumulate[RandomReal[{-1, 1}, {10000}]]; 3. Test if functions yield same results: Test1 = movAverageC[data, 30, 250, 10]; (*Moving average for 30 days to 250 days in steps of 10*) Test2 = movAverageOwn2C[data, 30, 250, 10]; (*Moving average for 30 days to 250 days in steps of 10*) Test1 == Test2 Out = True 4. Performance testing (Singe Core): AbsoluteTiming[Table[movAverageC[data, 30, 250, 10], {n, 1, 20, 1}];] (*Repeat function 20x for testing purposes*) Out = {1.3030000, Null} AbsoluteTiming[Table[movAverageOwn2C[data, 30, 250, 10], {n, 1, 20, 1}];] (*Repeat function 20x for testing purposes*) Out = {11.4260000, Null} => Result UDF 9x slower 5. Performance testing (multi core): LaunchKernels[] Out = {KernelObject[1, "local"], KernelObject[2, "local"]} DistributeDefinitions[data, movAverageOwn2C, movAverageOwn2FC, movAverageC] AbsoluteTiming[Parallelize[Table[movAverageC[data, 30, 250, 10], {n, 1, 20, 1}]];] Out = {1.3200000, Null} AbsoluteTiming[Parallelize[Table[movAverageOwn2C[data, 30, 250, 10], {n, 1, 20, 1}]];] Out = {6.7170000, Null} => Result UDF 5x slower Very strange that the built in function does not get faster with Parallelize I would very much appreciate any input on how to decrease calculation time based on the user defined function. Many thanks Stefan