MathGroup Archive 2010

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: disappointing CUDA speed

  • To: mathgroup at
  • Subject: [mg114209] Re: disappointing CUDA speed
  • From: truxton spangler <truxtonspangler29 at>
  • Date: Sat, 27 Nov 2010 03:36:32 -0500 (EST)
  • References: <iclfde$l9u$> <ico24m$nhu$>

On Nov 26, 9:28 pm, Crni Gorac <cgo... at> wrote:
> On Nov 25, 11:56 am, Gianluca Gorni <gianluca.go... at>
> wrote:
> > Hi,
> > I have a 1 year old Apple MacBookPro. I installed
> > the cudadriver_3.1.17_macos and then tried the first
> > examples in the documentation:
> > Needs["CUDALink`"]
> > CUDAQ[]
> >   True
> > randM = RandomReal[1, {3000, 3000}];
> > AbsoluteTiming[randM.randM;]
> >   {2.688389,Null}
> > AbsoluteTiming[CUDADot[randM, randM];]
> >   {7.328353,Null}
> > Quite a letdown.
> > Did I do something wrong?
> You may wish to re-run CUDADot[] command - there is kind of "warm-up"
> needed for CUDA.  Also, you may wish to try the next example from
> CUDALink user guide, with copying matrix to GPU memory and running
> CUDADot[] on it separated - this way, you'll be able to check the
> timing for the kernel execution only, which should be better
> indication of actual CUDA speed.  But overall: I tried the same
> examples on alike kind of hardware (CUDA Capability 1.1 generation),
> and indeed seems that timings are inconsistent, and also less than
> impressive when compared to CUBLAS results.  So it'll probably take
> some time until eventual bugs of this initial CUDALink release
> fixed...

After figuring out that my Ge 320M chip is a.k.a. a 9600M GT I finally
got CUDA working on my macbook pro.

I have no interest in image processing so I was more curious about
some of the other CUDA functions and was disappointed with the
limitation for CUDAFold, CUDAMap etc. CUDAMap is strange because the
function arguments are limited to a small group functions that are
listable, i.e. functions that you wouldn't use Map for in any case.
The toy examples show CUDA running slower than Map (no Listable
example given but that wins easily when you try it with the same
code). If you increase the size of the toys:


CUDA comes out marginally ahead of using Cos directly but with this
sort of toy example. Unfortunately in the documentation there is no
indication of where the real benefit in using CUDAMap is.


  • Prev by Date: Why are my 3D plots blue?
  • Next by Date: [Question] NonlinearRegress with two independent variables.
  • Previous by thread: Re: disappointing CUDA speed
  • Next by thread: Re: disappointing CUDA speed