incubator-giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avery Ching <ach...@apache.org>
Subject Re: zookeeper connection issue
Date Fri, 23 Dec 2011 17:25:24 GMT
Yeah, of those errors can seem a little scary.  But I think they are 
mostly harmless.  Let's go over each one inline.

On 12/23/11 7:10 AM, "Christoph Böhm" wrote:
> Hi List,
>
> I'm about to get started with Giraph and have a few of questions:
> when running the Pagrank example with
>     hadoop jar giraph-0.70-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark
-e 1 -s 3 -v -V 500000 -w 10
> this finishes but I find the following in one worker's logs:
>
> *** Worker:
> 2011-12-23 15:36:09,468 ERROR org.apache.zookeeper.ClientCnxn: Error while calling watcher
> java.lang.RuntimeException: org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /_hadoopBsp/job_201112231316_0010/_masterJobState
> 	at org.apache.giraph.graph.BspService.getJobState(BspService.java:564)
> 	at org.apache.giraph.graph.BspServiceWorker.processEvent(BspServiceWorker.java:1414)
> 	at org.apache.giraph.graph.BspService.process(BspService.java:1017)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode
= ConnectionLoss for /_hadoopBsp/job_201112231316_0010/_masterJobState
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
> 	at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
> 	at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:99)
> 	at org.apache.giraph.graph.BspService.getJobState(BspService.java:555)
> 	... 4 more

Depends when this happens.  If it's after the worker has let the master 
know that it was finished with everything, this is fine.

> *** The Master says:
> 2011-12-23 15:45:40,564 WARN org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers:
Got ConnectException
> java.net.ConnectException: Connection refused
> 	at java.net.PlainSocketImpl.socketConnect(Native Method)
> 	at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
> 	at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
> 	at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
> 	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
> 	at java.net.Socket.connect(Socket.java:525)
> 	at org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:624)
> 	at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:408)
> 	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:630)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
>
>
>
> Also, when I'm trying to run my own Job I see the following. All firewalls etc. should
be shutdown.
>
> *** Master (node09.de):
> 2011-12-23 15:57:47,140 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers:
Connect attempt 0 of 10 max trying to connect to node09.de:22181 with poll msecs = 3000
> 2011-12-23 15:57:47,143 WARN org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers:
Got ConnectException
> java.net.ConnectException: Connection refused
> 	at java.net.PlainSocketImpl.socketConnect(Native Method)
> 	at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
> 	at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
> 	at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
> 	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
> 	at java.net.Socket.connect(Socket.java:525)
> 	at org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:624)
> 	at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:409)
> 	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:630)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
>
>
>
> Thanks again.
> Christoph
These two exceptions on the master are also fine.  It takes some time 
for the master to start the zk service (hence the multiple connection 
attempts).

Mime
View raw message