hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Martin Illecker (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HAMA-890) PipesApplication connect to ZooKeeperSyncClinetImpl always timeout
Date Sat, 15 Mar 2014 20:15:43 GMT

    [ https://issues.apache.org/jira/browse/HAMA-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13936288#comment-13936288
] 

Martin Illecker commented on HAMA-890:
--------------------------------------

The following exception is thrown because no native C++ application is connected within time.
{code}
14/03/15 16:21:08 ERROR bsp.BSPTask: Error running bsp setup and bsp function.
java.net.SocketTimeoutException: Accept timed out
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:375)
at java.net.ServerSocket.implAccept(ServerSocket.java:478)
at java.net.ServerSocket.accept(ServerSocket.java:446)
at org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:286)
at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43)
at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
at org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)
{code}

Please enable debug logging in *conf/log4j.properties*
{code}hama.root.logger=DEBUG,console{code}

and in *c++/src/main/native/examples/conf/matrixmultiplication.xml*
{code}
  <property>
    <name>hama.pipes.logging</name>
    <value>true</value>
  </property>
{code}

Then execute the matrixmultiplication example using \[1] and you will find debug messages
in the logs.
{code}
$ cat logs/tasklogs/job_*/attempt_*.log
DEBUG pipes.PipesApplication: DEBUG: waiting for Client at 0.0.0.0/0.0.0.0:51342

$ /tmp/hadoop-YOUR_USER/bsp/local/groomServer/attempt_*/work/tasklogs/job_*/attempt_*.err
HamaPipes::runTask - logging is: true
HamaPipes::runTask - connected to GroomServer Port: ....
{code}

I believe the native matrixmultiplication application is not able to connect because of hostname
problems or the native application crashes. 
Please submit your logs.

\[1] https://github.com/apache/hama/blob/trunk/c%2B%2B/src/main/native/examples/README.txt#L84-125

> PipesApplication connect to ZooKeeperSyncClinetImpl always timeout
> ------------------------------------------------------------------
>
>                 Key: HAMA-890
>                 URL: https://issues.apache.org/jira/browse/HAMA-890
>             Project: Hama
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>         Environment: Hadoop 2.2.0 distribute mode
>            Reporter: lujing.zui
>
> I build a cluster, which contain 4 groomservers.
> I run a pipesApplication, matrixmultiplication, and in one groomserver, it occurs a problems
to connect to ZooKeeperSyncClient. so entire job failed. but in other groomservers, everything
is fine.
> I reboot the problematic node, still not solve this problem.
> As my understanding, both sides of this connect are in one node, connection accept timeout
seems impossible. iptables is off, and network is normal, ping every node is ok.
> I am so confused, any one can help me or give me some hint or suggestion? 
> Thanks so much!
> the log list below:
> 14/03/15 16:21:05 INFO ipc.Server: Starting Socket Reader #1 for port 61002
> 14/03/15 16:21:05 INFO ipc.Server: IPC Server Responder: starting
> 14/03/15 16:21:05 INFO ipc.Server: IPC Server listener on 61002: starting
> 14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 0 on 61002: starting
> 14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 2 on 61002: starting
> 14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 1 on 61002: starting
> 14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 3 on 61002: starting
> 14/03/15 16:21:05 INFO message.HamaMessageManagerImpl: BSPPeer address:hd1.hadoop.lab
port:61002
> 14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 4 on 61002: starting
> 14/03/15 16:21:05 INFO Configuration.deprecation: mapred.cache.localFiles is deprecated.
Instead, use mapreduce.job.cache.local.files
> 14/03/15 16:21:05 INFO sync.ZKSyncClient: Initializing ZK Sync Client
> 14/03/15 16:21:05 INFO sync.ZooKeeperSyncClientImpl: Start connecting to Zookeeper! At
hd1.hadoop.lab/222.195.92.69:61002
> 14/03/15 16:21:08 ERROR bsp.BSPTask: Error running bsp setup and bsp function.
> java.net.SocketTimeoutException: Accept timed out
> 	at java.net.PlainSocketImpl.socketAccept(Native Method)
> 	at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:375)
> 	at java.net.ServerSocket.implAccept(ServerSocket.java:478)
> 	at java.net.ServerSocket.accept(ServerSocket.java:446)
> 	at org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:286)
> 	at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43)
> 	at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
> 	at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
> 	at org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)
> 14/03/15 16:21:08 ERROR bsp.BSPTask: Error cleaning up after bsp executed.
> java.lang.NullPointerException
> 	at org.apache.hama.pipes.PipesBSP.cleanup(PipesBSP.java:95)
> 	at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:177)
> 	at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
> 	at org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)
> 14/03/15 16:21:08 INFO ipc.Server: Stopping server on 61002
> 14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 0 on 61002: exiting
> 14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 2 on 61002: exiting
> 14/03/15 16:21:08 INFO ipc.Server: Stopping IPC Server listener on 61002
> 14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 3 on 61002: exiting
> 14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 4 on 61002: exiting
> 14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 1 on 61002: exiting
> 14/03/15 16:21:08 INFO ipc.Server: Stopping IPC Server Responder
> 14/03/15 16:21:08 ERROR bsp.BSPTask: Shutting down ping service.
> 14/03/15 16:21:08 FATAL bsp.GroomServer: Error running child
> java.net.SocketTimeoutException: Accept timed out
> 	at java.net.PlainSocketImpl.socketAccept(Native Method)
> 	at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:375)
> 	at java.net.ServerSocket.implAccept(ServerSocket.java:478)
> 	at java.net.ServerSocket.accept(ServerSocket.java:446)
> 	at org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:286)
> 	at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43)
> 	at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
> 	at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
> 	at org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)
> java.net.SocketTimeoutException: Accept timed out
> 	at java.net.PlainSocketImpl.socketAccept(Native Method)
> 	at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:375)
> 	at java.net.ServerSocket.implAccept(ServerSocket.java:478)
> 	at java.net.ServerSocket.accept(ServerSocket.java:446)
> 	at org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:286)
> 	at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43)
> 	at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
> 	at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
> 	at org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message