Re: Mathematica 8 remote parallel kernels
- To: mathgroup at smc.vnet.net
- Subject: [mg120656] Re: Mathematica 8 remote parallel kernels
- From: Iván Pulido Sanchez <ijpulido.s at gmail.com>
- Date: Tue, 2 Aug 2011 07:13:11 -0400 (EDT)
- Delivered-to: l-mathgroup@mail-archive0.wolfram.com
- References: <j0truv$d50$1@smc.vnet.net>
On Sun, Jul 31, 2011 at 10:37 PM, Oleksandr Rasputinov < oleksandr_rasputinov at hmamail.com> wrote: > On Sun, 31 Jul 2011 12:25:50 +0100, Iv=E1n Pulido Sanchez > > <ijpulido.s at gmail.com> wrote: > > > On Sat, Jul 30, 2011 at 5:00 AM, Oleksandr Rasputinov < > > oleksandr_rasputinov at hmamail.com> wrote: > > > >> Perhaps you are exceeding your quota of kernel licences? (Particularly > >> likely on a shared server where others may or may not also be using some > >> of the available licences at any given time.) Another possibility, if > >> you > >> are trying to launch a large number of parallel kernels, is that you are > >> running into some kind of limitation on the number of concurrent TCP/IP > >> connections. > >> > >> Other than that, it is likely platform-dependent, but you do not mention > >> what platform you are using. Correctly setting up Windows > >> client-to-Windows server parallel kernel operation is slightly more > >> difficult than for the other platforms, for example. > >> > > > > Thanks for the ideas. And you are right I should've mentioned the OS it's > > running on. It's Linux (Debian Squeeze to be exact). > > > > I don't think it's a license issue since I can run on each node 4 > > Mathematica Kernels without problem, the problem is making them process > > in > > parallel (meaning parallel remote kernels). > > > > This used to work just fine with mathematica7 (Over 20 kernels without > > problem, now I can't get more than 10 in the best case) no idea what > > could > > have changed with mathematica8. > > > > I've tried manually running the kernels with the ssh command that is > > shown > > in the Parallel Kernel Configuration dialog without success (Link names > > problems and such). If you could help me with this it would be very mu > ch > > appreciated since this could lead to solve the problem. > > > > Thanks again for your response. > > > > I'm afraid I can't reproduce your problem--at least not on Xubuntu 10.04, > where I can launch at least 24 subkernels via ssh without any > difficulties. Perhaps a firewall or networking stack setting is limiting > the number of simultaneous half-open TCP connections? Are you able to > start all of the kernels if you launch them a few at a time, e.g. using > > LaunchKernels[ > {SubKernels`RemoteKernels`RemoteMachine["hostname", 4]} > ] > > I tried this command and it didn't work correctly. I did it twice for every node and got 13 Kernels running after that. Even though I should have 20 kernels running (4 for each of the 5 nodes). The error message is the same as before using the kernel configuration tool. LinkConnect::linkc: Unable to connect to LinkObject[37263 at 192.168.1.100, 34767 at 192.168.1.100,45,19]. >> Thanks for your help, no idea why is this still happening. I check nodes load and network traffic and there isnt a problem there. Load is none and network traffic is just about 60kb/s. > with the hostname of one of the nodes being substituted at each iteration? > > Alternatively, if the nodes are heavily loaded and take some time to start > the kernels, you may run into the MathLink timeout of 15 seconds. Starting > fewer kernels at once may also help in this situation. > > >> On Fri, 29 Jul 2011 09:45:19 +0100, Iv=E1n Pulido Sanchez > >> <ijpulido.s at gmail.com> wrote: > >> > >> > Hello, > >> > > >> > When I try to configure remote kernels in Mathematica via > >> > Evaluation>Parallel Kernel Configuration ... then I go to "Remote > >> > Kernels" > >> > and add hosts, after that I try to Launch the remote kernels and only > >> > >> > >> > some > >> > of them get launched (the number of them varies),and finally I get a > >> msg > >> > like the following. > >> > > >> > KernelObject::rdead: Subkernel connected through remote[nodo2] appears > >> > dead. > >> >>> LinkConnect::linkc: Unable to connect to > >> >>> LinkObject[36154 at 192.168.1.104, > >> > 49648 at 192.168.1.104,38,12]. >> General::stop: Further output of > >> > LinkConnect::linkc will be suppressed during this calculation. >> > >> > > >> > Any ideas how to get this working? > >> > > >> > Take into account it sometimes does load some of the remote kernels > >> but > >> > never all of them. Thanks in advance I hope you can help me with this. > >