Re: Re: Re: Floating-Point Computing
- To: mathgroup at smc.vnet.net
- Subject: [mg94008] Re: [mg93998] Re: [mg93927] Re: Floating-Point Computing
- From: Daniel Lichtblau <danl at wolfram.com>
- Date: Tue, 2 Dec 2008 00:39:44 -0500 (EST)
- References: <ggofnk$rsh$1@smc.vnet.net> <200812011202.HAA05932@smc.vnet.net>
danl at wolfram.com wrote: >> Antonio wrote: >>> Dear List, >>> >>> I came across this Sun article on floating point: >>> http://developers.sun.com/solaris/articles/fp_errors.html >>> >>> which calculates the following code: >>> >>> main() >>> { >>> register i; >>> double f=0.0; >>> double sqrt(); >>> >>> for(i=0;i<=1000000000;i++) >>> { >>> f+=sqrt( (double) i); >>> } >>> printf("Finished %20.14f\n",f); >>> } >>> >>> whose correct answer using Euler-Maclaurin formula is: >>> f=21081851083600.37596259382529338 >>> >>> Tried to implement in mathematica, just for fun, so that later I could >>> play with precision, but the kernel has no more memory and shuts down. >>> >>> N[Total[Sqrt[Range[1000000000]]]] >>> >> If all those numbers are stored as 32-bit integers in a packed array, >> they would take up about 4*10^9/1024^3 ~= 3.7 GB space. That is already >> too much for a 32-bit machine, and more than most present-day personal >> computers can handle. Mapping Sqrt to the list unpacks it, eating up >> even more memory. Creating a symbolic sum with Total and only >> numericising at the end doesn't help either. >> >> The only reasonable approach here is to write code that is equivalent to >> the C version, i.e. doesn't pre-compute the list of numbers to be >> summed. But that will run too slowly in Mathematica. So Mathematica is >> not really the right tool for summing 10^9 numbers. >> >> Two possible versions: >> >> fun1[n_Integer] := Module[{f = 0.}, >> Do[f += N@Sqrt[i], {i, n}]; >> f >> ] >> >> fun2 = Compile[{{n, _Integer}}, Module[{f = 0.}, >> Do[f += N@Sqrt[i], {i, n}]; >> f >> ] >> ] >> >> fun2[10^9] can finish in a reasonable time, but it only works with >> machine precision numbers (because of Compile). >> >> If you sum fewer numbers, use Total@Sqrt@N@Range[n] instead of >> N@Total@Sqrt@Range[n]. > > These two approaches can be hybridized to give quite good speed. The one > caveat is that, when totalling a list, the semantics are not defined as to > order of summing. Though in actual practice I believe Total sums > first-to-last, which is in accord with the semantics of the original > question. > > Anyway, here is code that sums square roots to n, using blocks of size m. > > cfun = Compile[{{n, _Integer}, {m, _Integer}}, > Module[{tot = 0., flr = Floor[n/m]}, > Do[ > tot += Total[Sqrt[j*m + Range[m]]]; > , {j, 0, flr - 1}]; > tot += Total[Sqrt[Range[flr*m + 1, n]]]; > tot]]; > > It's about four times faster than what I had for summing one at a time > (essentially the same as fun2 above). > > In[129]:= Timing[cfun[10^7, 10^5]] > Out[129]= {0.841, 2.10819\[Times]10^10} > > In[131]:= Timing[cfun[10^8, 10^7]] > Out[131]= {10.005, 6.66667\[Times]10^11} > > In[130]:= Timing[cfun[10^9, 10^7]] > Out[130]= {104.44, 2.10819\[Times]10^13} > > My general experience with Compile suggest that this speed is perhaps > within a factor of 2-4 of the C code above. The test below also suggest > this, since it spends most of its time in C code (though it may be a > factor of 2 slower than need be, since it makes two passes over the list). > > In[135]:= Timing[Total[Sqrt[Range[1., 10^7]]]] > > Out[135]= {0.431, 2.10819\[Times]10^10} > > Daniel Lichtblau > Wolfram Research After giving this some thought and looking at cfun[[4]], I decided this CompiledFunction was maybe not so efficient. So I rewrote without Compile and it got faster. I think the reason is that Compile is doing a bit of extra work in the implicit loops (Range or coercing it's result to be reals, maybe). Anyway, here is faster code. fun[n_Integer, m_Integer] := Module[{tot = 0., flr = Floor[n/m], k = 0.}, Do[ tot += Total[Sqrt[k + Range[m]]]; k += m; , {j, 0, flr - 1}]; tot += Total[Sqrt[Range[k + 1., n]]]; tot] In[12]:= InputForm[Timing[fun[10^9, 10^7]]] Out[12]//InputForm= {26.009045999999998, 2.1081851083600375*^13} This is on my machine at work, which is substantially faster than the laptop I had used for the timings above. It is nearly twice as fast as cfun on my work machine (this seems to be heavily system dependent; it is only 25% or so faster on my laptop). Daniel Lichtblau Wolfram Research
- References:
- Re: Re: Floating-Point Computing
- From: danl@wolfram.com
- Re: Re: Floating-Point Computing