MathGroup Archive 2011

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Mathematica 8 remote parallel kernels

  • To: mathgroup at smc.vnet.net
  • Subject: [mg120656] Re: Mathematica 8 remote parallel kernels
  • From: Iván Pulido Sanchez <ijpulido.s at gmail.com>
  • Date: Tue, 2 Aug 2011 07:13:11 -0400 (EDT)
  • Delivered-to: l-mathgroup@mail-archive0.wolfram.com
  • References: <j0truv$d50$1@smc.vnet.net>

On Sun, Jul 31, 2011 at 10:37 PM, Oleksandr Rasputinov <
oleksandr_rasputinov at hmamail.com> wrote:

> On Sun, 31 Jul 2011 12:25:50 +0100, Iv=E1n Pulido Sanchez
>
> <ijpulido.s at gmail.com> wrote:
>
> > On Sat, Jul 30, 2011 at 5:00 AM, Oleksandr Rasputinov <
> > oleksandr_rasputinov at hmamail.com> wrote:
> >
> >> Perhaps you are exceeding your quota of kernel licences? (Particularly
> >> likely on a shared server where others may or may not also be using some
> >> of the available licences at any given time.) Another possibility, if
> >> you
> >> are trying to launch a large number of parallel kernels, is that you are
> >> running into some kind of limitation on the number of concurrent TCP/IP
> >> connections.
> >>
> >> Other than that, it is likely platform-dependent, but you do not mention
> >> what platform you are using. Correctly setting up Windows
> >> client-to-Windows server parallel kernel operation is slightly more
> >> difficult than for the other platforms, for example.
> >>
> >
> > Thanks for the ideas. And you are right I should've mentioned the OS it's
> > running on. It's Linux (Debian Squeeze to be exact).
> >
> > I don't think it's a license issue since I can run on each node 4
> > Mathematica Kernels without problem, the problem is making them process
> > in
> > parallel (meaning parallel remote kernels).
> >
> > This used to work just fine with mathematica7 (Over 20 kernels without
> > problem, now I can't get more than 10 in the best case) no idea what
> > could
> > have changed with mathematica8.
> >
> > I've tried manually running the kernels with the ssh command that is
> > shown
> > in the Parallel Kernel Configuration dialog without success (Link names
> > problems and such). If you could help me with this it would be very mu
> ch
> > appreciated since this could lead to solve the problem.
> >
> > Thanks again for your response.
> >
>
> I'm afraid I can't reproduce your problem--at least not on Xubuntu 10.04,
> where I can launch at least 24 subkernels via ssh without any
> difficulties. Perhaps a firewall or networking stack setting is limiting
> the number of simultaneous half-open TCP connections? Are you able to
> start all of the kernels if you launch them a few at a time, e.g. using
>
> LaunchKernels[
>  {SubKernels`RemoteKernels`RemoteMachine["hostname", 4]}
>  ]
>
>
I tried this command and it didn't work correctly. I did it twice for
every node and got 13 Kernels running after that. Even though I
should have 20 kernels running (4 for each of the 5 nodes).

The error message is the same as before using the kernel configuration
tool.

LinkConnect::linkc: Unable to connect to LinkObject[37263 at 192.168.1.100,
34767 at 192.168.1.100,45,19]. >>

Thanks for your help, no idea why is this still happening.

I check nodes load and network traffic and there isnt a problem there.

Load is none and network traffic is just about 60kb/s.


> with the hostname of one of the nodes being substituted at each iteration?
>
> Alternatively, if the nodes are heavily loaded and take some time to start
> the kernels, you may run into the MathLink timeout of 15 seconds. Starting
> fewer kernels at once may also help in this situation.
>
> >> On Fri, 29 Jul 2011 09:45:19 +0100, Iv=E1n Pulido Sanchez
> >> <ijpulido.s at gmail.com> wrote:
> >>
> >> > Hello,
> >> >
> >> > When I try to configure remote kernels in Mathematica via
> >> > Evaluation>Parallel Kernel Configuration ... then I go to "Remote
> >> > Kernels"
> >> > and add hosts, after that I try to Launch the remote kernels and only
> >>
> >>
> >> > some
> >> > of them get launched (the number of them varies),and finally I get a
> >> msg
> >> > like the following.
> >> >
> >> > KernelObject::rdead: Subkernel connected through remote[nodo2] appears
> >> > dead.
> >> >>> LinkConnect::linkc: Unable to connect to
> >> >>> LinkObject[36154 at 192.168.1.104,
> >> > 49648 at 192.168.1.104,38,12]. >> General::stop: Further output of
> >> > LinkConnect::linkc will be suppressed during this calculation. >>
> >> >
> >> > Any ideas how to get this working?
> >> >
> >> > Take into account it sometimes does load some of the remote kernels
> >> but
> >> > never all of them. Thanks in advance I hope you can help me with this.
>
>



  • Prev by Date: Re: Why won't this sum evaluate?
  • Next by Date: Re: Do command
  • Previous by thread: Re: Why won't this sum evaluate?
  • Next by thread: Integrating Interpolating function