SOLVED - Excess Mathematica CPU usage on Solaris.
- To: mathgroup at smc.vnet.net
- Subject: [mg65750] SOLVED - Excess Mathematica CPU usage on Solaris.
- From: "Dr. David Kirkby" <david.kirkby at onetel.net>
- Date: Sun, 16 Apr 2006 03:49:28 -0400 (EDT)
- Sender: owner-wri-mathgroup at wolfram.com
[This post has been delayed due to email problems - moderator] If you don't use Mathematica 5.1 or 5.2 on Solaris 10, you might as well skip to the next message, as this will not bother you. I've reported here before that Mathematica 5.2 was using excessive CPU time on Solaris 10. When the Mathematica GUI is used, computing 1+1 returns the answer 2 quite quickly, but then pegs the CPU usage at 100% which will slow a system considerably. Following a post of mine on comp.unix.solaris http://groups.google.co.uk/group/comp.unix.solaris/browse_frm/thread/28de6dd19027d8b1/fc1a6deee0c169fe?q=mathematica&rnum=2#fc1a6deee0c169fe the ever helpful Casper Dik at Sun Microsystems took an interest. After I run some tests using the UNIX tools lsof, truss, prstat and others, Casper was able to come up with a likely explanation and a workaround. When I implemented that workaround, the problem went. Sun have changed the default timeout of select() from 1 ms in Solaris 9 to 1 us in Solaris 10. Mathematica was performing a task but on slower processors is unable to complete it within 1 us, whereas it could do it in 1 ms under Solaris 9 with no problems. So it times out on Solaris 10 unless the CPU is quite fast (how fast I don't know). Casper feels this is a Solaris bug, and has submitted it as one: "6404383 select() behavior changed in Solaris 10, breaking binary compatibility" Casper Dik says he thinks Sun will round up the select timeout to 1ms again in the library for select() - but his signature makes it clear he is speaking for himself, not Sun! The Sun Ultra 60, 80 & Blade 100 have all exhibited this problem and I would expect it to be seen on the Ultra 1, 2, 5 and 10 too. Apparently a Sun Blade 1000 is not affected, but since I don't know the specs of the machine used, it might mean slower Blade 1000's will be affected. I'm guessing, but perhaps faster machines will be affected if they are more heavily loaded. WORKAROUND Hopefully Sun will release a patch at some point to address bug id 6404383, but there is a workaround which prevents Mathematics pegging the CPU at 100%. 1) Download the C source code Casper wrote http://groups.google.co.uk/group/comp.unix.solaris/browse_frm/thread/28de6dd19027d8b1/fc1a6deee0c169fe?q=mathematica&rnum=2#fc1a6deee0c169fe (For completeness I have stuck it at the end of this post, so all information is in in the one post). 2) Compile the C source code 'select_preload.c' to make a 64-bit shared library. cc -xtarget=ultra -xarch=v9 -G -Kpic select_preload.c -o select_preload.so This syntax uses Sun's compiler (gcc will be different). Note, that syntax is a bit different to what Casper posted on comp.unix.solaris as I had to change it for the Mathematica kernel, which is a 64-bit binary. 3) Copy the shared library somewhere convenient - I used /usr/local/lib/ 4) Edit the script that calls the Mathematica kernel (the script on the system here is /usr/local/Wolfram/Mathematica/5.2/Executables/math). These two lines: LD_PRELOAD_64=/usr/local/lib/select_preload.so export LD_PRELOAD_64 need to be added near the end of the script, just before the last line. (Obviously, changing the LD_PRELOAD_64 line to point at wherever you have put the shared library you built). So the last 3 lines in the script /usr/local/Wolfram/Mathematica/5.2/Executables/math are: LD_PRELOAD_64=/usr/local/lib/select_preload.so export LD_PRELOAD_64 exec "${MathKernel}" "$@" 5) Fire up Mathematica, compute 1+1 or something else silly, then look with prstat. top or similar at CPU usage. Hopefully it will not keep climbing all the time. I hope that is useful to anyone affected. I will email it to Wolfram too so they can append it to a support request. Here is the C source Casper wrote, which addresses the issue. I've edited his comment line on how to compile it, but other than that it is unchanged. If there is any line wrapping of the C source, you might need to sort that out manually. /* * Select roundup preload. (casper.... at you.know.where) * cc -G -Kpic select_preload.c -o select_preload.so >> Note from David Kirkby - change to this for 64-bit: >> cc -xtarget=ultra -xarch=v9 -G -Kpic select_preload.c -o select_preload.so */ #include <dlfcn.h> #include <sys/time.h> #define FUN_PROTO(type,internal,parms) \ type internal parms #define DECLARE(type,name, parms) static FUN_PROTO(type,(*name), parms) #define CAST(type, parms) (FUN_PROTO(type,(*), parms)) DECLARE(int,next_select,(int, fd_set *, fd_set *, fd_set *, struct timeval *)); #ifdef __GNUC__ void loadit(void) __attribute__ ((constructor)); #else #pragma init(loadit) #endif void loadit(void) { extern char **environ; char **env; int offset; next_select = CAST(int, (int, fd_set *, fd_set *, fd_set *, struct timeval * ))dlsym(RTLD_NEXT, "select"); } int select(int nfds, fd_set *restrict readfds, fd_set *restrict writefds, fd_set *restrict errorfds, struct timeval *restrict timeout) { if (timeout != NULL && timeout->tv_sec == 0 && timeout->tv_usec > 0 && timeout->tv_usec < 1000) timeout->tv_usec = 1000; return (next_select(nfds, readfds, writefds, errorfds, timeout)); }