Return-Path: X-Original-To: apmail-hama-dev-archive@www.apache.org Delivered-To: apmail-hama-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0F5C2106D5 for ; Sat, 15 Mar 2014 20:15:44 +0000 (UTC) Received: (qmail 44556 invoked by uid 500); 15 Mar 2014 20:15:43 -0000 Delivered-To: apmail-hama-dev-archive@hama.apache.org Received: (qmail 44529 invoked by uid 500); 15 Mar 2014 20:15:43 -0000 Mailing-List: contact dev-help@hama.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hama.apache.org Delivered-To: mailing list dev@hama.apache.org Received: (qmail 44521 invoked by uid 99); 15 Mar 2014 20:15:43 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 15 Mar 2014 20:15:43 +0000 Date: Sat, 15 Mar 2014 20:15:43 +0000 (UTC) From: "Martin Illecker (JIRA)" To: dev@hama.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HAMA-890) PipesApplication connect to ZooKeeperSyncClinetImpl always timeout MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HAMA-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13936288#comment-13936288 ] Martin Illecker commented on HAMA-890: -------------------------------------- The following exception is thrown because no native C++ application is connected within time. {code} 14/03/15 16:21:08 ERROR bsp.BSPTask: Error running bsp setup and bsp function. java.net.SocketTimeoutException: Accept timed out at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:375) at java.net.ServerSocket.implAccept(ServerSocket.java:478) at java.net.ServerSocket.accept(ServerSocket.java:446) at org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:286) at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43) at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170) at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144) at org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243) {code} Please enable debug logging in *conf/log4j.properties* {code}hama.root.logger=DEBUG,console{code} and in *c++/src/main/native/examples/conf/matrixmultiplication.xml* {code} hama.pipes.logging true {code} Then execute the matrixmultiplication example using \[1] and you will find debug messages in the logs. {code} $ cat logs/tasklogs/job_*/attempt_*.log DEBUG pipes.PipesApplication: DEBUG: waiting for Client at 0.0.0.0/0.0.0.0:51342 $ /tmp/hadoop-YOUR_USER/bsp/local/groomServer/attempt_*/work/tasklogs/job_*/attempt_*.err HamaPipes::runTask - logging is: true HamaPipes::runTask - connected to GroomServer Port: .... {code} I believe the native matrixmultiplication application is not able to connect because of hostname problems or the native application crashes. Please submit your logs. \[1] https://github.com/apache/hama/blob/trunk/c%2B%2B/src/main/native/examples/README.txt#L84-125 > PipesApplication connect to ZooKeeperSyncClinetImpl always timeout > ------------------------------------------------------------------ > > Key: HAMA-890 > URL: https://issues.apache.org/jira/browse/HAMA-890 > Project: Hama > Issue Type: Bug > Affects Versions: 0.7.0 > Environment: Hadoop 2.2.0 distribute mode > Reporter: lujing.zui > > I build a cluster, which contain 4 groomservers. > I run a pipesApplication, matrixmultiplication, and in one groomserver, it occurs a problems to connect to ZooKeeperSyncClient. so entire job failed. but in other groomservers, everything is fine. > I reboot the problematic node, still not solve this problem. > As my understanding, both sides of this connect are in one node, connection accept timeout seems impossible. iptables is off, and network is normal, ping every node is ok. > I am so confused, any one can help me or give me some hint or suggestion? > Thanks so much! > the log list below: > 14/03/15 16:21:05 INFO ipc.Server: Starting Socket Reader #1 for port 61002 > 14/03/15 16:21:05 INFO ipc.Server: IPC Server Responder: starting > 14/03/15 16:21:05 INFO ipc.Server: IPC Server listener on 61002: starting > 14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 0 on 61002: starting > 14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 2 on 61002: starting > 14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 1 on 61002: starting > 14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 3 on 61002: starting > 14/03/15 16:21:05 INFO message.HamaMessageManagerImpl: BSPPeer address:hd1.hadoop.lab port:61002 > 14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 4 on 61002: starting > 14/03/15 16:21:05 INFO Configuration.deprecation: mapred.cache.localFiles is deprecated. Instead, use mapreduce.job.cache.local.files > 14/03/15 16:21:05 INFO sync.ZKSyncClient: Initializing ZK Sync Client > 14/03/15 16:21:05 INFO sync.ZooKeeperSyncClientImpl: Start connecting to Zookeeper! At hd1.hadoop.lab/222.195.92.69:61002 > 14/03/15 16:21:08 ERROR bsp.BSPTask: Error running bsp setup and bsp function. > java.net.SocketTimeoutException: Accept timed out > at java.net.PlainSocketImpl.socketAccept(Native Method) > at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:375) > at java.net.ServerSocket.implAccept(ServerSocket.java:478) > at java.net.ServerSocket.accept(ServerSocket.java:446) > at org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:286) > at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43) > at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170) > at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144) > at org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243) > 14/03/15 16:21:08 ERROR bsp.BSPTask: Error cleaning up after bsp executed. > java.lang.NullPointerException > at org.apache.hama.pipes.PipesBSP.cleanup(PipesBSP.java:95) > at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:177) > at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144) > at org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243) > 14/03/15 16:21:08 INFO ipc.Server: Stopping server on 61002 > 14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 0 on 61002: exiting > 14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 2 on 61002: exiting > 14/03/15 16:21:08 INFO ipc.Server: Stopping IPC Server listener on 61002 > 14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 3 on 61002: exiting > 14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 4 on 61002: exiting > 14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 1 on 61002: exiting > 14/03/15 16:21:08 INFO ipc.Server: Stopping IPC Server Responder > 14/03/15 16:21:08 ERROR bsp.BSPTask: Shutting down ping service. > 14/03/15 16:21:08 FATAL bsp.GroomServer: Error running child > java.net.SocketTimeoutException: Accept timed out > at java.net.PlainSocketImpl.socketAccept(Native Method) > at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:375) > at java.net.ServerSocket.implAccept(ServerSocket.java:478) > at java.net.ServerSocket.accept(ServerSocket.java:446) > at org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:286) > at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43) > at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170) > at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144) > at org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243) > java.net.SocketTimeoutException: Accept timed out > at java.net.PlainSocketImpl.socketAccept(Native Method) > at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:375) > at java.net.ServerSocket.implAccept(ServerSocket.java:478) > at java.net.ServerSocket.accept(ServerSocket.java:446) > at org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:286) > at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43) > at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170) > at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144) > at org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243) -- This message was sent by Atlassian JIRA (v6.2#6252)