giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bojan Babic <gba...@gmail.com>
Subject Issue with Giraph on multinode cluster
Date Fri, 17 Oct 2014 20:01:52 GMT
Hi guys,

I'm risking to post issue that has been already issued, but I'll take risk
to be ridiculed :)

I have small hadoop cluster on Digital Ocean (1 master  4 nodes). I was
able to setup cluster and run word count example as well as single node
sample from Quick start.

As I introduce more nodes into play, I get issue where Task Tracker spawns
Child process

hduser@hdnode-2:~# jps
> 13839 TaskTracker
> 13697 DataNode
> 14067 Jps
> 13962 Child

*13961 Child*


that listen on looback interface

Proto Recv-Q Send-Q Local Address           Foreign Address         State
>     User       Inode       PID/Program name
> tcp        0      0 127.0.0.1:1337          0.0.0.0:*
> LISTEN      root       21544925    29912/python
> tcp        0      0 0.0.0.0:50010           0.0.0.0:*
> LISTEN      hduser     21691552    13697/java
> tcp        0      0 127.0.0.1:30011         0.0.0.0:*
> LISTEN      hduser     21693578    13962/java
> tcp        0      0 0.0.0.0:50075           0.0.0.0:*
> LISTEN      hduser     21691554    13697/java
> tcp        0      0 0.0.0.0:50020           0.0.0.0:*
> LISTEN      hduser     21691557    13697/java
> tcp        0      0 127.0.0.1:50118         0.0.0.0:*
> LISTEN      hduser     21691870    13839/java
> tcp        0      0 0.0.0.0:41640           0.0.0.0:*
> LISTEN      hduser     21691296    13697/java
> tcp        0      0 127.0.0.1:31337         0.0.0.0:*
> LISTEN      root       20432660    1514/python
> tcp        0      0 0.0.0.0:50060           0.0.0.0:*
> LISTEN      hduser     21692144    13839/java
> tcp        0      0 0.0.0.0:http-alt        0.0.0.0:*
> LISTEN      root       20431897    1421/python
>
>
> *tcp        0      0 127.0.0.1:30001 <http://127.0.0.1:30001/>
> 0.0.0.0:*               LISTEN      hduser     21370004    7856/ssh
>  tcp        0      0 127.0.0.1:30003 <http://127.0.0.1:30003/>
> 0.0.0.0:*               LISTEN      hduser     21693562    13961/java      *tcp
>       0      0 127.0.0.1:58741         0.0.0.0:*               LISTEN
>   hduser     21370000    7856/ssh
> tcp        0      0 127.0.0.1:58742         0.0.0.0:*
> LISTEN      hduser     21369982    7845/autossh
> tcp        0      0 0.0.0.0:ssh             0.0.0.0:*
> LISTEN      root       9130        834/sshd
> tcp6       0      0 ::1:30001               :::*
> LISTEN      hduser     21370003    7856/ssh
> tcp6       0      0 ::1:58741               :::*
> LISTEN      hduser     21369999    7856/ssh
> tcp6       0      0 :::ssh                  :::*
> LISTEN      root       9165        834/sshd


instead of all interfaces (0.0.0.0)

This results in node being unreachable from other nodes. ie hdnode02:

>
> 2014-10-17 14:10:31,146 WARN org.apache.giraph.comm.netty.NettyClient:
> 2014-10-17 14:10:31,159 WARN org.apache.giraph.comm.netty.NettyClient:
> connectAllAddresses: Future failed to connect with
> hdnode-2/XXX.XXX.XXX.XXX:30003 with 1 failures because of
> java.net.ConnectException: Connection refused:
> *hdnode-2/XXX.XXX.XXX.XXX:30003*
> 2014-10-17 14:10:31,159 INFO org.apache.giraph.comm.netty.NettyClient:
> connectAllAddresses: Successfully added 1 connections, (1 total connected)
> 2 failed, 2 failures total.


If I stop all processes and start nc on 30003, I can telnet to hdnode2.

Question here is if there is any setup that will configure Child process to
listen on 0.0.0.0 instead of loopback interface?

Thanks in advance

Mime
View raw message