MathGroup Archive 2006

[Date Index] [Thread Index] [Author Index]

Search the Archive

SOLVED - Excess Mathematica CPU usage on Solaris.

  • To: mathgroup at smc.vnet.net
  • Subject: [mg65750] SOLVED - Excess Mathematica CPU usage on Solaris.
  • From: "Dr. David Kirkby" <david.kirkby at onetel.net>
  • Date: Sun, 16 Apr 2006 03:49:28 -0400 (EDT)
  • Sender: owner-wri-mathgroup at wolfram.com

[This post has been delayed due to email problems - moderator]

If you don't use Mathematica 5.1 or 5.2 on Solaris 10, you might as well skip to 
the next message, as this will not bother you.

I've reported here before that Mathematica 5.2 was using  excessive CPU time on 
Solaris 10. When the Mathematica  GUI is used, computing 1+1 returns the answer 
2  quite quickly, but then pegs the CPU usage at 100% which will slow a system 
considerably.

Following a post of mine on comp.unix.solaris
http://groups.google.co.uk/group/comp.unix.solaris/browse_frm/thread/28de6dd19027d8b1/fc1a6deee0c169fe?q=mathematica&rnum=2#fc1a6deee0c169fe
the ever helpful Casper Dik at Sun Microsystems took an interest. After I run 
some tests using the UNIX tools lsof, truss, prstat and others, Casper was able 
to come up with a likely explanation and a workaround. When I implemented that 
workaround, the problem went.

Sun have changed the default timeout of select() from 1 ms in Solaris 9 to 1 us 
in Solaris 10. Mathematica was performing a task but on slower processors is 
unable to complete it within 1 us, whereas it could do it in 1 ms under Solaris 
9 with no problems. So it times out on Solaris 10 unless the CPU is quite fast 
(how fast I don't know).

Casper feels this is a Solaris bug, and has submitted it as one:

"6404383 select() behavior changed in Solaris 10, breaking binary compatibility"

Casper Dik says he thinks Sun will round up the select timeout to 1ms again in 
the library for select() - but his signature makes it clear he is speaking for 
himself, not Sun!

The Sun Ultra 60, 80 & Blade 100 have all exhibited this problem and I would 
expect it to be seen on the Ultra 1, 2, 5 and 10 too. Apparently a Sun Blade 
1000 is not affected, but since I don't know the specs of the machine used, it 
might mean slower Blade 1000's will be affected. I'm guessing, but perhaps 
faster machines will be affected if they are more heavily loaded.

WORKAROUND

Hopefully Sun will release a patch at some point to address bug id 6404383, but 
there is a workaround which prevents Mathematics pegging the CPU at 100%.

1) Download the C source code Casper wrote

http://groups.google.co.uk/group/comp.unix.solaris/browse_frm/thread/28de6dd19027d8b1/fc1a6deee0c169fe?q=mathematica&rnum=2#fc1a6deee0c169fe

(For completeness I have stuck it at the end of this post, so all information is 
in in the one post).

2) Compile the C source code 'select_preload.c' to make a
64-bit shared library.

cc -xtarget=ultra -xarch=v9 -G -Kpic select_preload.c -o select_preload.so

This syntax uses Sun's compiler (gcc will be different).

Note, that syntax is a bit different to what Casper posted on comp.unix.solaris 
as I had to change it for the Mathematica kernel, which is a 64-bit binary.

3) Copy the shared library somewhere convenient - I used /usr/local/lib/

4) Edit the script that calls the Mathematica kernel (the script on the system 
here is /usr/local/Wolfram/Mathematica/5.2/Executables/math).

These two lines:

LD_PRELOAD_64=/usr/local/lib/select_preload.so
export LD_PRELOAD_64

need to be added near the end of the script, just before  the last line. 
(Obviously, changing the LD_PRELOAD_64 line to point at wherever you have put 
the shared library you built).

So the last 3 lines in the script 
/usr/local/Wolfram/Mathematica/5.2/Executables/math are:

LD_PRELOAD_64=/usr/local/lib/select_preload.so
export LD_PRELOAD_64
exec "${MathKernel}" "$@"

5) Fire up Mathematica, compute 1+1 or something else silly, then look with 
prstat. top or similar at CPU usage. Hopefully it will not keep climbing all the 
time.

I hope that is useful to anyone affected. I will email it to Wolfram too so they 
can append it to a support request.


Here is the C source Casper wrote, which addresses the issue. I've edited
his comment line on how to compile it, but other than that it is unchanged.

If there is any line wrapping of the C source, you might need to sort that out 
manually.

/*
   * Select roundup preload.  (casper.... at you.know.where)
   * cc -G -Kpic select_preload.c -o select_preload.so

   >> Note from David Kirkby - change to this for 64-bit:
   >> cc -xtarget=ultra -xarch=v9 -G -Kpic select_preload.c -o select_preload.so
   */

#include <dlfcn.h>
#include <sys/time.h>

#define FUN_PROTO(type,internal,parms) \
          type internal parms

#define DECLARE(type,name, parms) static FUN_PROTO(type,(*name), parms)
#define CAST(type, parms) (FUN_PROTO(type,(*), parms))

DECLARE(int,next_select,(int, fd_set *, fd_set *, fd_set *, struct timeval *));

#ifdef __GNUC__
void loadit(void) __attribute__ ((constructor));
#else
#pragma init(loadit)
#endif

void
loadit(void)
{
      extern char **environ;
      char **env;
      int offset;

      next_select = CAST(int, (int, fd_set *, fd_set *, fd_set *, struct timeval *
))dlsym(RTLD_NEXT, "select");

}

int select(int nfds, fd_set *restrict readfds, fd_set  *restrict  writefds,
      fd_set  *restrict errorfds, struct timeval *restrict timeout)
{

          if (timeout != NULL && timeout->tv_sec == 0 && timeout->tv_usec > 0 &&
              timeout->tv_usec < 1000)
                  timeout->tv_usec = 1000;

          return (next_select(nfds, readfds, writefds, errorfds, timeout));

}


  • Prev by Date: Re: The D'Agostino Pearson k^2 test implemented in mathematica / variance of difference sign test
  • Next by Date: Re: Mathematica and Education
  • Previous by thread: Re: The D'Agostino Pearson k^2 test implemented in mathematica / variance of difference sign test
  • Next by thread: setps problem/how to set matrix elements