MathGroup Archive: December 2008 [00396]

[Date Index] [Thread Index] [Author Index]

Re: Mathematica 7 is now available

To: mathgroup at smc.vnet.net
Subject: [mg94381] Re: Mathematica 7 is now available
From: Daniel Lichtblau <danl at wolfram.com>
Date: Thu, 11 Dec 2008 03:48:09 -0500 (EST)
References: <gg62p3$56g$1@smc.vnet.net> <gh8hkr$r1d$1@smc.vnet.net> <ghavsp$ock$1@smc.vnet.net> <200812081121.GAA15802@smc.vnet.net> <ghlmbp$jof$1@smc.vnet.net> <200812100949.EAA00307@smc.vnet.net>

[My comments are interspersed with the questions. --dl]

Jon Harrop wrote:
> Daniel Lichtblau wrote:
>> Jon Harrop wrote:
>>> This is not specific to OCaml. Here are equivalent implementations of
>>> that program in C++, Java, OCaml, SML, Scheme, Lisp and Haskell and all
>>> well over 100,000x faster than the Mathematica I gave (and many are more
>>> concise as well!):
>>>
>>>   http://www.ffconsultancy.com/languages/ray_tracer/
>> I do not understand this claim that the speed difference is 5 orders of
>> magnitude. Some experiments indicate 3, perhaps.
> 
> The code I gave is over 5 orders of magnitude slower by my measurements.

Offhand I do not know what would account for this. I ran it in 
Mathematica 7 (a 64 bit Linux machine) on

Main[3, 52, 4]

and observed something around 7-10 times slower than what I coded. I did 
not try it on larger scenes, so possibly there is something going wrong 
in the recursion. Certainly any of these codes will scale linearly with 
the number of pixels (so quadratically with n), so my using a dimension 
of 52 instead of 512 here should not matter for assessment of relative 
speeds.

>> Some modest improvements to the original Mathematica code, though, bring
>> it to maybe 2 orders of magnitude of the OCaml speed.
> 
> That is a huge improvement. Thank you.
> 
>> In the variant 
>> below, I make sure everyone is a machine real number or machine integer,
>> whichever is appropriate. I use Compile on RaySphere. There are a few
>> other tweaks for general efficiency (some might apply to the OCaml
>> implementation, I'm not sure).
> 
> I have a few questions about this. :-)
> 
>> delta = Sqrt[$MachineEpsilon];
>> inf = 100000.;
> 
> This is nasty. Why is infinity not represented internally by the floating
> point number infinity in Mathematica?

Mathematica does not use machine double infinities, or bignum infinities 
for that matter. Generally when those are encountered in a computation 
e.g. with dedicated machine arithmetic, an exception is thrown and 
Mathematica attempts to work around it e.g. with software higher precision.

This is not to say there is no reason to have a machine double infinity 
in Mathematica. I can say, however, that putting one in would involve 
considerable work, and I think it would give but small improvement in 
speed or functionality.

>> neglight = Normalize[{1., 3., -2.}];
>> nohit = {inf, {0., 0., 0.}};
>>
>> Clear[RaySphere, Intersect, Create, RayTrace, Main];
>>
>> RaySphere = With[{inf = inf}, Compile[
> 
> What is the purpose of "With[{inf = inf}, ..]" here?

This is to inject the evaluated form into the Compile. Else it is a 
nuisance to get that form, because Compile has the HoldAll attribute.

>>      {{c, _Real, 1}, {o, _Real, 1}, {d, _Real, 1}, {r, _Real}},
>>      Module[{v = c - o, b, disc, t},
>>       b = v.d;
>>       disc = b^2 - v.v + r;
>>       If[disc < 0., inf,
>>        disc = Sqrt[disc];
>>        If[b > disc, b - disc,
>>         t = b + disc;
>>         If[t < 0., inf, t]]]]]];
>>
>> Intersect[o_, d_, ln : {lambda_, n_}, List[c_, r_]] := Block[
>>    {lambda2 = RaySphere[c, o, d, r]},
>>    If[lambda2 >= lambda, ln, {lambda2, c}]]
> 
> Defer calculation of the normal. Fair enough but it is only done once per
> pixel.

It makes a small but measurable difference to speed. Helps a bit more 
than you indicate, because the original code uses Normalize inside 
Intersect, where it might be done as many times as lambda keeps 
decreasing for a given originating call (to Intersect, with the entire 
scene).

>> Intersect[o_, d_, ln : {lambda_, n_}, List[c_, r_, s_]] := Block[
>>    {lambda2 = RaySphere[c, o, d, r]},
>>    If[lambda2 >= lambda, ln,
>>     Fold[Intersect[o, d, #1, #2] &, ln, s]
>>     ]
>>    ]
> 
> No currying and a named subpattern, ok.
> 
>> RayTrace[o_, d_, scene_] := Block[
>>    {lambda, n, g, p},
>>    {lambda, n} = Intersect[o, d, nohit, scene];
>>    If[lambda >= inf, 0.,
>>     n = Normalize[o + d*lambda - n];
>>     g = n.neglight;
>>     If[g <= 0., 0.,
>>      {lambda, n} =
>>       Intersect[o + lambda d + delta n, neglight, nohit, scene];
>>      If[lambda < inf, 0., g]]]]
>>
>> Create[level_, c_, r_] :=
>>   Block[{obj = List[c, r^2]},
>>    If[level == 1, obj,
>>     Block[{a = 3*r/Sqrt[12], Aux},
>>      Aux[x1_, z1_] := Create[level - 1, c + {x1, a, z1}, 0.5 r];
>>      List[c,
>>       9*r^2, {obj, Aux[-a, -a], Aux[a, -a], Aux[-a, a], Aux[a, a]}]]]]
>>
>> prepareTable = Compile[{{n, _Integer}, {s, _Integer}},
>>     With[{nss = N[s]},
>>      Flatten[
>>       Table[Normalize[{(x + s1/nss)/n - .5, (y + s2/nss)/n - .5,
>>          1.}], {s1, 0, s - 1}, {s2, 0, s - 1}, {y, 0, n - 1}, {x, 0,
>>         n - 1}], 1]
>>      ]];
> 
> So you precompute a matrix of all the ray directions using a compiled
> function.
> 
>> Main[level_, n_, ss_] :=
>>   Block[{scene = Create[level, {0., -1., 4.}, 1.], nss = N[ss]},
>>    scene = MapAll[Developer`ToPackedArray, scene];
>>    Total[Map[RayTrace[{0., 0., 0.}, #, scene] &,
>>       prepareTable[n, ss], {3}], 1]/nss^2]
>>
>> In[22]:= Timing[ff512 = Main[9, 512, 4];]
>> Out[22]= {1542.27, Null}
> 
> That is orders of magnitude faster than my version but I am not sure why. I
> suspect the main performance improvement comes from the use of packed
> arrays but I haven't been able to reproduce it in my own code yet.

Yes, packed arrays, which among other things avoids ALL exact arithmetic 
in places you do not want it. Also compiling RaySphere gives, by itself, 
a factor of two or more in overal speed gain.

>> At this point, it might make sense to use ParallelMap instead of Map, in
>> Main.
> 
> Replacing Map with ParallelMap slows it down, even with
> Method -> "CoarsestGrained". How can I get a speedup from my 8 cores?! :-)
> 
> Many thanks,

I only made the suggestion, without trying it, because I have no actual 
familiarity with ParallelMap. But one possibility: did you use 
DistributeDefinitions on all the functions underneath that Map? If not, 
that could certainly make it slower rather than faster.

Daniel Lichtblau
Wolfram Research

References:
- Re: Mathematica 7 is now available
  - From: Jon Harrop <jon@ffconsultancy.com>
- Re: Mathematica 7 is now available
  - From: Jon Harrop <jon@ffconsultancy.com>

Prev by Date: Re: GraphicsGrid spacing

Next by Date: Re: Mathematica 7 is now available

Previous by thread: Re: Mathematica 7 is now available

Next by thread: Re: Mathematica 7 is now available