hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "lujing.zui (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HAMA-890) PipesApplication connect to ZooKeeperSyncClinetImpl always timeout
Date Sat, 15 Mar 2014 08:52:42 GMT

     [ https://issues.apache.org/jira/browse/HAMA-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

lujing.zui updated HAMA-890:
----------------------------

    Description: 
I build a cluster, which contain 4 groomservers.
I run a pipesApplication, matrixmultiplication, and in one groomserver, it occurs a problems
to connect to ZooKeeperSyncClient. so entire job failed. but in other groomservers, everything
is fine.
I reboot the problematic node, still not solve this problem.

As my understanding, both sides of this connect are in one node, connection accept timeout
seems impossible. iptables is off, and network is normal, ping every node is ok.
I am so confused, any one can help me or give me some hint or suggestion? 
Thanks so much!

the log list below:
14/03/15 16:21:05 INFO ipc.Server: Starting Socket Reader #1 for port 61002
14/03/15 16:21:05 INFO ipc.Server: IPC Server Responder: starting
14/03/15 16:21:05 INFO ipc.Server: IPC Server listener on 61002: starting
14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 0 on 61002: starting
14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 2 on 61002: starting
14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 1 on 61002: starting
14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 3 on 61002: starting
14/03/15 16:21:05 INFO message.HamaMessageManagerImpl: BSPPeer address:hd1.hadoop.lab port:61002
14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 4 on 61002: starting
14/03/15 16:21:05 INFO Configuration.deprecation: mapred.cache.localFiles is deprecated. Instead,
use mapreduce.job.cache.local.files
14/03/15 16:21:05 INFO sync.ZKSyncClient: Initializing ZK Sync Client
14/03/15 16:21:05 INFO sync.ZooKeeperSyncClientImpl: Start connecting to Zookeeper! At hd1.hadoop.lab/222.195.92.69:61002
14/03/15 16:21:08 ERROR bsp.BSPTask: Error running bsp setup and bsp function.
java.net.SocketTimeoutException: Accept timed out
	at java.net.PlainSocketImpl.socketAccept(Native Method)
	at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:375)
	at java.net.ServerSocket.implAccept(ServerSocket.java:478)
	at java.net.ServerSocket.accept(ServerSocket.java:446)
	at org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:286)
	at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43)
	at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
	at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
	at org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)
14/03/15 16:21:08 ERROR bsp.BSPTask: Error cleaning up after bsp executed.
java.lang.NullPointerException
	at org.apache.hama.pipes.PipesBSP.cleanup(PipesBSP.java:95)
	at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:177)
	at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
	at org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)
14/03/15 16:21:08 INFO ipc.Server: Stopping server on 61002
14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 0 on 61002: exiting
14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 2 on 61002: exiting
14/03/15 16:21:08 INFO ipc.Server: Stopping IPC Server listener on 61002
14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 3 on 61002: exiting
14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 4 on 61002: exiting
14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 1 on 61002: exiting
14/03/15 16:21:08 INFO ipc.Server: Stopping IPC Server Responder
14/03/15 16:21:08 ERROR bsp.BSPTask: Shutting down ping service.
14/03/15 16:21:08 FATAL bsp.GroomServer: Error running child
java.net.SocketTimeoutException: Accept timed out
	at java.net.PlainSocketImpl.socketAccept(Native Method)
	at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:375)
	at java.net.ServerSocket.implAccept(ServerSocket.java:478)
	at java.net.ServerSocket.accept(ServerSocket.java:446)
	at org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:286)
	at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43)
	at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
	at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
	at org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)
java.net.SocketTimeoutException: Accept timed out
	at java.net.PlainSocketImpl.socketAccept(Native Method)
	at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:375)
	at java.net.ServerSocket.implAccept(ServerSocket.java:478)
	at java.net.ServerSocket.accept(ServerSocket.java:446)
	at org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:286)
	at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43)
	at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
	at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
	at org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)

  was:
I build a cluster, which contain 4 groomserver.
I run a pipesApplication, matrixmultiplication, and in one groomserver, it occurs a problems
to connect to ZooKeeperSyncClient. so entire job failed. but other groomserver, everything
is fine.
I reboot the problematic node, cannot solve this problem.

As I understand, both sides of this connect are in one node, accept timeout seems impossible.
iptables is off, and network is normal, ping every node is ok.
I am so confused, any one can help me or give me some hint or suggestion? 
Thanks so much!

the log list below:
14/03/15 16:21:05 INFO ipc.Server: Starting Socket Reader #1 for port 61002
14/03/15 16:21:05 INFO ipc.Server: IPC Server Responder: starting
14/03/15 16:21:05 INFO ipc.Server: IPC Server listener on 61002: starting
14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 0 on 61002: starting
14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 2 on 61002: starting
14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 1 on 61002: starting
14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 3 on 61002: starting
14/03/15 16:21:05 INFO message.HamaMessageManagerImpl: BSPPeer address:hd1.hadoop.lab port:61002
14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 4 on 61002: starting
14/03/15 16:21:05 INFO Configuration.deprecation: mapred.cache.localFiles is deprecated. Instead,
use mapreduce.job.cache.local.files
14/03/15 16:21:05 INFO sync.ZKSyncClient: Initializing ZK Sync Client
14/03/15 16:21:05 INFO sync.ZooKeeperSyncClientImpl: Start connecting to Zookeeper! At hd1.hadoop.lab/222.195.92.69:61002
14/03/15 16:21:08 ERROR bsp.BSPTask: Error running bsp setup and bsp function.
java.net.SocketTimeoutException: Accept timed out
	at java.net.PlainSocketImpl.socketAccept(Native Method)
	at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:375)
	at java.net.ServerSocket.implAccept(ServerSocket.java:478)
	at java.net.ServerSocket.accept(ServerSocket.java:446)
	at org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:286)
	at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43)
	at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
	at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
	at org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)
14/03/15 16:21:08 ERROR bsp.BSPTask: Error cleaning up after bsp executed.
java.lang.NullPointerException
	at org.apache.hama.pipes.PipesBSP.cleanup(PipesBSP.java:95)
	at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:177)
	at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
	at org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)
14/03/15 16:21:08 INFO ipc.Server: Stopping server on 61002
14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 0 on 61002: exiting
14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 2 on 61002: exiting
14/03/15 16:21:08 INFO ipc.Server: Stopping IPC Server listener on 61002
14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 3 on 61002: exiting
14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 4 on 61002: exiting
14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 1 on 61002: exiting
14/03/15 16:21:08 INFO ipc.Server: Stopping IPC Server Responder
14/03/15 16:21:08 ERROR bsp.BSPTask: Shutting down ping service.
14/03/15 16:21:08 FATAL bsp.GroomServer: Error running child
java.net.SocketTimeoutException: Accept timed out
	at java.net.PlainSocketImpl.socketAccept(Native Method)
	at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:375)
	at java.net.ServerSocket.implAccept(ServerSocket.java:478)
	at java.net.ServerSocket.accept(ServerSocket.java:446)
	at org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:286)
	at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43)
	at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
	at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
	at org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)
java.net.SocketTimeoutException: Accept timed out
	at java.net.PlainSocketImpl.socketAccept(Native Method)
	at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:375)
	at java.net.ServerSocket.implAccept(ServerSocket.java:478)
	at java.net.ServerSocket.accept(ServerSocket.java:446)
	at org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:286)
	at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43)
	at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
	at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
	at org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)


> PipesApplication connect to ZooKeeperSyncClinetImpl always timeout
> ------------------------------------------------------------------
>
>                 Key: HAMA-890
>                 URL: https://issues.apache.org/jira/browse/HAMA-890
>             Project: Hama
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>         Environment: Hadoop 2.2.0 distribute mode
>            Reporter: lujing.zui
>
> I build a cluster, which contain 4 groomservers.
> I run a pipesApplication, matrixmultiplication, and in one groomserver, it occurs a problems
to connect to ZooKeeperSyncClient. so entire job failed. but in other groomservers, everything
is fine.
> I reboot the problematic node, still not solve this problem.
> As my understanding, both sides of this connect are in one node, connection accept timeout
seems impossible. iptables is off, and network is normal, ping every node is ok.
> I am so confused, any one can help me or give me some hint or suggestion? 
> Thanks so much!
> the log list below:
> 14/03/15 16:21:05 INFO ipc.Server: Starting Socket Reader #1 for port 61002
> 14/03/15 16:21:05 INFO ipc.Server: IPC Server Responder: starting
> 14/03/15 16:21:05 INFO ipc.Server: IPC Server listener on 61002: starting
> 14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 0 on 61002: starting
> 14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 2 on 61002: starting
> 14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 1 on 61002: starting
> 14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 3 on 61002: starting
> 14/03/15 16:21:05 INFO message.HamaMessageManagerImpl: BSPPeer address:hd1.hadoop.lab
port:61002
> 14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 4 on 61002: starting
> 14/03/15 16:21:05 INFO Configuration.deprecation: mapred.cache.localFiles is deprecated.
Instead, use mapreduce.job.cache.local.files
> 14/03/15 16:21:05 INFO sync.ZKSyncClient: Initializing ZK Sync Client
> 14/03/15 16:21:05 INFO sync.ZooKeeperSyncClientImpl: Start connecting to Zookeeper! At
hd1.hadoop.lab/222.195.92.69:61002
> 14/03/15 16:21:08 ERROR bsp.BSPTask: Error running bsp setup and bsp function.
> java.net.SocketTimeoutException: Accept timed out
> 	at java.net.PlainSocketImpl.socketAccept(Native Method)
> 	at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:375)
> 	at java.net.ServerSocket.implAccept(ServerSocket.java:478)
> 	at java.net.ServerSocket.accept(ServerSocket.java:446)
> 	at org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:286)
> 	at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43)
> 	at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
> 	at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
> 	at org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)
> 14/03/15 16:21:08 ERROR bsp.BSPTask: Error cleaning up after bsp executed.
> java.lang.NullPointerException
> 	at org.apache.hama.pipes.PipesBSP.cleanup(PipesBSP.java:95)
> 	at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:177)
> 	at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
> 	at org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)
> 14/03/15 16:21:08 INFO ipc.Server: Stopping server on 61002
> 14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 0 on 61002: exiting
> 14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 2 on 61002: exiting
> 14/03/15 16:21:08 INFO ipc.Server: Stopping IPC Server listener on 61002
> 14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 3 on 61002: exiting
> 14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 4 on 61002: exiting
> 14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 1 on 61002: exiting
> 14/03/15 16:21:08 INFO ipc.Server: Stopping IPC Server Responder
> 14/03/15 16:21:08 ERROR bsp.BSPTask: Shutting down ping service.
> 14/03/15 16:21:08 FATAL bsp.GroomServer: Error running child
> java.net.SocketTimeoutException: Accept timed out
> 	at java.net.PlainSocketImpl.socketAccept(Native Method)
> 	at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:375)
> 	at java.net.ServerSocket.implAccept(ServerSocket.java:478)
> 	at java.net.ServerSocket.accept(ServerSocket.java:446)
> 	at org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:286)
> 	at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43)
> 	at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
> 	at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
> 	at org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)
> java.net.SocketTimeoutException: Accept timed out
> 	at java.net.PlainSocketImpl.socketAccept(Native Method)
> 	at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:375)
> 	at java.net.ServerSocket.implAccept(ServerSocket.java:478)
> 	at java.net.ServerSocket.accept(ServerSocket.java:446)
> 	at org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:286)
> 	at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43)
> 	at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
> 	at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
> 	at org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message