Re: Speed of Mathematica on AMD machines
- To: mathgroup at
- Subject: [mg126443] Re: Speed of Mathematica on AMD machines
- From: "Oleksandr Rasputinov" <oleksandr_rasputinov at>
- Date: Fri, 11 May 2012 00:12:08 -0400 (EDT)
- Delivered-to:
- References: <jog00v$g1d$>
On Thu, 10 May 2012 09:59:11 +0100, <einschlag at> wrote: > We have recently bought an iBuyPower gaming PC for our research group: > > AMD FX 8 core, 3.6 GHz, 16 GB RAM > > MathematicaMark8 Benchmark 0.86 is not bad, considering the price ~$800 > of this PC but I was expecting much more. > > Apparently Intel's MKL library used by Mathematica is not optimized for > AMD processors. > > A test program calculating exponentials of large matrices takes 13 s on > the AMD PC and only 8 s on my Mac Pro (Mathematica benchmark 0.7) that > has 8 Intel Xeon cores at 2.4 GHz. And on my Lenovo laptop the program > runs 9 s. I blame it on the MKL inadequacy for AMD. > > TestProgram := Module[{}, > NN = 1000; > AMatr = Table[RandomReal[], {i, 1, NN}, {j, 1, NN}]; > NExec = 10; > For[i = 1, i < NExec, i++, > MatrixExp[AMatr]; > ]; > ] > > Execution by iBuyPower PC (AMD FX 8 core, Linux Ubuntu 64 bit) > > TestProgram // AbsoluteTiming > > {13.230105, Null} > > Execution by Mac Pro (Intel Xeon 2 x 4 core) > > TestProgram // AbsoluteTiming > > {8.126944, Null} > > Execution by Lenovo laptop (Intel i7-QM2060, Windows 7 64 bit) > > TestProgram // AbsoluteTiming > > {9.4275392, Null} > > > On the other hand, a program compiling in C from Mathematica's help runs > very fast on the AMD PC: > > TestProgram2 := Module[{}, > c = Compile[ {{x, _Real}, {n, _Integer}}, > Module[ {sum, inc}, sum = 1.0; inc = 1.0; > Do[inc = inc*x/i; sum = sum + inc, {i, n}]; sum], > CompilationTarget -> "C"]; > c[1.6, 10000000]; > ] > > Execution by iBuyPower PC (AMD FX 8 core, Linux Ubuntu 64 bit, GCC > compiler) > > TestProgram2 // AbsoluteTiming > > {0.114427, Null} > > Execution by Mac Pro (Intel Xeon 2 x 4 core, GCC compiler) > > TestProgram2 // AbsoluteTiming > > {0.212875, Null} > > Execution by Lenovo laptop (Intel i7-QM2060, Windows 7 64 bit, Microsoft > Visual C++) > > TestProgram2 // AbsoluteTiming > > {0.3540203, Null} > > It seems the second test program is not using MKL and thus AMD becomes > very efficient. > > I will continue testing. > > Is there any way to improve Mathematica's performance on AMD machines? > > Dmitry > In the past, Intel had been known to engage in anticompetitive practices with respect to AMD, and quite rightly was subject to legal penalties for this. (Specifically, they encouraged large computer manufacturers such as Dell to take up exclusive supply contracts by means of large discounts and availability guarantees.) As a result of this judgment there has been a lot of general hysteria that Intel may still be discriminating against AMD performance-wise in their library and compiler products, which has culminated in legal threats resulting in the large disclaimers posted all over Intel's products stating that they are not meant for anything other than Intel processors. Suspicion and disclaimers are one thing, but actual performance is another. As you may be aware, AMD offers their own math library, ACML. What most people who level this criticism of MKL are not aware of, however, is that MKL actually performs better than ACML, *even on AMD processors*. So, even if it is not optimized as thoroughly as it might be for AMD processors (which is more than likely the case; Intel does not have an infinite development budget and there is no financial incentive for them to go to great lengths optimizing for other manufacturers' processors, which have performance characteristics very different to their own), MKL is still better than the alternatives. Now, how then to explain the poor performance you observe? Unfortunately, the latest generation of AMD processors are simply not very good (the Bulldozer processors are actually worse than the previous-generation Phenom II processors in many applications), whereas Intel's products have been making dramatic gains lately despite AMD's reduced competitiveness. The end result is that a Bulldozer core is "worth" about half a Sandy Bridge core, clock for clock, especially in floating-point workloads since a single FP unit is shared between two of what AMD calls cores (indeed, many have said that AMD's "8 core" processors are more correctly referred to as genuinely having 4 cores due to much shared apparatus, but for marketing reasons, AMD is obviously not buying that argument). In regard to your results from TestProgram2: sorry to say, these are invalid because the time taken to compile to C completely overwhelms the actual runtime, and you include both in the assessment, as well as using AbsoluteTiming which is not appropriate for single-threaded code with short runtimes executing inside the Mathematica kernel. A more valid test is: c = Compile[{{x, _Real}, {n, _Integer}}, Module[{sum, inc}, sum = 1.0; inc = 1.0; Do[inc = inc*x/i; sum = sum + inc, {i, n}]; sum], CompilationTarget -> "C" ]; Do[c[1.6, 10000000], {10}] // Timing which on my computer (Intel Core 2, 3.2GHz) takes about 0.65 seconds, i.e. 65 ms for a single evaluation of c[1.6, 10000000]. Your matrix exponential test would also be better posed as: NN = 1000; mat = RandomReal[{0, 1}, {NN, NN}]; Do[MatrixExp[mat], {10}] // AbsoluteTiming (I get 9.5 seconds.) However I would be reluctant to draw any firm conclusion from these tests if I were you. Far better to look at published benchmarks for real applications, for instance: or which both show that Bulldozer performance is a very mixed bag in general. While there are a few applications in which it can match or only just outperform Intel's offerings, for the most part it falls behind them considerably. Best, O. R.