Re: Mathematica 20x slower than Java at arithmetic/special functions, is

*To*: mathgroup at smc.vnet.net*Subject*: [mg115910] Re: Mathematica 20x slower than Java at arithmetic/special functions, is*From*: Oliver Ruebenkoenig <ruebenko at wolfram.com>*Date*: Tue, 25 Jan 2011 06:30:28 -0500 (EST)

On Tue, 25 Jan 2011, Sjoerd C. de Vries wrote: > I found a speed-up factor of about 52 using Oliver's method. This is > on my quad-core laptop. > So quite an improvement, but not a factor of 100. > Autch, I tested against a non N[] version of the problem. Sorry for that. I see the same speedup as everyone else. Oliver > I used Evaluate[] too. I assume it works because with it the Bessel > functions are evaluated at compile time (it's arguments do not depend > on x and y), otherwise it's done at runtime. > > Cheers -- Sjoerd > > On Jan 24, 11:57 am, Leo Alekseyev <dnqu... at gmail.com> wrote: >> Vivek, Oliver -- thanks for your input! My knowledge in using >> Compile[] is somewhat lacking (mostly, due to the fact that I was >> never able to get it to work well for me). In particular, I tried >> using Compile[] much in the same way that Vivek has suggested, but I >> neglected to use Evaluate[], which leads to a compiled function taking >> substantially longer. Is there a quick explanation for why Evaluate[] >> (or, in Oliver's example, a construct like >> With[{code=code},Compile[{...},code]] necessary?.. >> >> On my (very modest) hardware, I indeed get ~25x speedup that Vivek >> mentions. Oliver's code for me performs about the same (~25x >> improvement) without parallelism, and 2x faster on a dual-core >> machine; this actually seems reasonable since the two methods are >> fairly similar. >> >> I should note that it seems that these optimizations are very >> dependent on Mathematica 8: in particular, cfunc2 (compilation of a >> compiled function evaluating over some data) in Vivek's example gives >> no additional gain under Mathematica 7 (makes me curious what changed >> in version 8), and RuntimeAttributes -> Listable, Parallelization -> >> True options that Oliver uses are new to version 8. >> >> --Leo >> >> On Mon, Jan 24, 2011 at 4:14 AM, Vivek J. Joshi <viv... at wolfram.com> wrot= > e: >> >>> Without going into too much detail, a simple compilation of the functio= > n gives approx 6x to 25x speed up, >> >>> ClearAll[grid1dc]; >>> grid1dc[x_,y_]=(With[{d=0.1,NN=50}, >>> Sum[Re[N[d BesselJ[1,2 Pi d Sqrt[m^2+n^2]]/Sqrt[m^2+n^2+10^-7]] Exp[I 2= > .0Pi (m x+n y)]],{m,-NN,NN,1},{n,-NN,NN,1}]])//N; >> >>> gridres1da=With[{delta=0.5,xlim=2.5,ylim=2.5}, >>> Table[{x,y,grid1dc[x,y]},{x,-xlim,xlim,delta},{y,-ylim,ylim,delta}]];//= > AbsoluteTiming >>> {7.371354,Null} >> >>> Clear[cfunc]; >>> cfunc = Compile[{{x,_Real},{y,_Real}},Evaluate[grid1dc[x,y]]]; >> >>> gridres1da2=With[{delta=0.5,xlim=2.5,ylim=2.5}, >>> Table[{x,y,cfunc[x,y]},{x,-xlim,xlim,delta},{y,-ylim,ylim,delta}]];//Ab= > soluteTiming >>> {1.237029,Null} >> >>> Norm[gridres1da[[All,All,3]]-gridres1da2[[All,All,3]]]//Chop >>> 0 >> >>> Following gives about 25x speedup, >> >>> Clear[cfunc2]; >>> cfunc2= Compile[{{xlim,_Real},{ylim,_Real},{delta,_Real}}, >>> Block[{x,y}, >>> Table[{x,y,cfunc[x,y]},{x,-xlim,xlim,delta},{y,-ylim,ylim,delta}]]]; >> >>> gridres1da3=cfunc2[2.5,2.5,0.5];//AbsoluteTiming >>> {0.292562,Null} >> >>> Norm[gridres1da[[All,All,3]]-gridres1da3[[All,All,3]]]//Chop >>> 0 >> >>> Vivek J. Joshi >>> Kernel Developer >>> Wolfram Research Inc. >> >>> On Jan 24, 2011, at 4:03 AM, Leo Alekseyev wrote: >> >>>> I was playing around with JLink the other day, and noticed that Java >>>> seems to outperform Mathematica by ~20x in an area where I'd expect >>>> Mathematica to be rather well optimized -- arithmetic involving specia= > l >>>> functions. In my particular example, I am simply evaluating a sum o= > f >>>> Bessel functions. I understand that much depends on the underlying >>>> implementation, but I just want to run this by Mathgroup to see if >>>> this is to be expected, or maybe if I'm doing something suboptimal in >>>> Mathematica. Here's the code that I'm running: >> >>>> grid1dc[x_, >>>> y_] = (With[{d = 0.1, NN = 50}, >>>> Sum[Re[N[ >>>> d BesselJ[1, 2 Pi d Sqrt[m^2 + n^2]]/ >>>> Sqrt[m^2 + n^2 + 10^-7]] Exp[ >>>> I 2.0 Pi (m x + n y)]], {m, -NN, NN, 1}, {n, -NN, NN, 1= > }]= >> ]) // >>>> N >> >>>> and >> >>>> gridres1da = >>>> With[{delta = 0.5, xlim = 2.5, ylim = 2.5}, >>>> Table[{x, y, grid1dc[x, y]}, {x, -xlim, xlim, delta}, {y, -ylim, >>>> ylim, delta}]] >> >>>> Java implementation uses Colt and Apache common math libraries for the >>>> Bessels and complex numbers, uses a double for loop, and consistently >>>> runs 15-20 times faster. >> >>>> --Leo >> >> > > >