Re: disappointing CUDA speed
- To: mathgroup at smc.vnet.net
- Subject: [mg114209] Re: disappointing CUDA speed
- From: truxton spangler <truxtonspangler29 at gmail.com>
- Date: Sat, 27 Nov 2010 03:36:32 -0500 (EST)
- References: <iclfde$l9u$1@smc.vnet.net> <ico24m$nhu$1@smc.vnet.net>
On Nov 26, 9:28 pm, Crni Gorac <cgo... at gmail.com> wrote: > On Nov 25, 11:56 am, Gianluca Gorni <gianluca.go... at fastwebnet.it> > wrote: > > > > > Hi, > > > I have a 1 year old Apple MacBookPro. I installed > > the cudadriver_3.1.17_macos and then tried the first > > examples in the documentation: > > > Needs["CUDALink`"] > > CUDAQ[] > > True > > randM = RandomReal[1, {3000, 3000}]; > > AbsoluteTiming[randM.randM;] > > {2.688389,Null} > > > AbsoluteTiming[CUDADot[randM, randM];] > > {7.328353,Null} > > > Quite a letdown. > > Did I do something wrong? > > You may wish to re-run CUDADot[] command - there is kind of "warm-up" > needed for CUDA. Also, you may wish to try the next example from > CUDALink user guide, with copying matrix to GPU memory and running > CUDADot[] on it separated - this way, you'll be able to check the > timing for the kernel execution only, which should be better > indication of actual CUDA speed. But overall: I tried the same > examples on alike kind of hardware (CUDA Capability 1.1 generation), > and indeed seems that timings are inconsistent, and also less than > impressive when compared to CUBLAS results. So it'll probably take > some time until eventual bugs of this initial CUDALink release > fixed... After figuring out that my Ge 320M chip is a.k.a. a 9600M GT I finally got CUDA working on my macbook pro. I have no interest in image processing so I was more curious about some of the other CUDA functions and was disappointed with the limitation for CUDAFold, CUDAMap etc. CUDAMap is strange because the function arguments are limited to a small group functions that are listable, i.e. functions that you wouldn't use Map for in any case. The toy examples show CUDA running slower than Map (no Listable example given but that wins easily when you try it with the same code). If you increase the size of the toys: Timing[CUDAMap[Cos,1.0*Range[1000000]];] Timing[Map[Cos,1.0*Range[1000000]];] Timing[Cos[1.0*Range[1000000]];] CUDA comes out marginally ahead of using Cos directly but with this sort of toy example. Unfortunately in the documentation there is no indication of where the real benefit in using CUDAMap is. T