Return-Path: X-Original-To: apmail-incubator-giraph-user-archive@minotaur.apache.org Delivered-To: apmail-incubator-giraph-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 616C17F77 for ; Fri, 23 Dec 2011 17:25:25 +0000 (UTC) Received: (qmail 44903 invoked by uid 500); 23 Dec 2011 17:25:25 -0000 Delivered-To: apmail-incubator-giraph-user-archive@incubator.apache.org Received: (qmail 44853 invoked by uid 500); 23 Dec 2011 17:25:25 -0000 Mailing-List: contact giraph-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: giraph-user@incubator.apache.org Delivered-To: mailing list giraph-user@incubator.apache.org Received: (qmail 44845 invoked by uid 99); 23 Dec 2011 17:25:25 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Dec 2011 17:25:25 +0000 Received: from localhost (HELO achingmbp15.local) (127.0.0.1) (smtp-auth username aching, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Dec 2011 17:25:25 +0000 Message-ID: <4EF4B984.5000001@apache.org> Date: Fri, 23 Dec 2011 09:25:24 -0800 From: Avery Ching User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:8.0) Gecko/20111105 Thunderbird/8.0 MIME-Version: 1.0 To: giraph-user@incubator.apache.org Subject: Re: zookeeper connection issue References: <20111223151056.304220@gmx.net> In-Reply-To: <20111223151056.304220@gmx.net> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Yeah, of those errors can seem a little scary. But I think they are mostly harmless. Let's go over each one inline. On 12/23/11 7:10 AM, "Christoph Böhm" wrote: > Hi List, > > I'm about to get started with Giraph and have a few of questions: > when running the Pagrank example with > hadoop jar giraph-0.70-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 500000 -w 10 > this finishes but I find the following in one worker's logs: > > *** Worker: > 2011-12-23 15:36:09,468 ERROR org.apache.zookeeper.ClientCnxn: Error while calling watcher > java.lang.RuntimeException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /_hadoopBsp/job_201112231316_0010/_masterJobState > at org.apache.giraph.graph.BspService.getJobState(BspService.java:564) > at org.apache.giraph.graph.BspServiceWorker.processEvent(BspServiceWorker.java:1414) > at org.apache.giraph.graph.BspService.process(BspService.java:1017) > at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) > Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /_hadoopBsp/job_201112231316_0010/_masterJobState > at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) > at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637) > at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:99) > at org.apache.giraph.graph.BspService.getJobState(BspService.java:555) > ... 4 more Depends when this happens. If it's after the worker has let the master know that it was finished with everything, this is fine. > *** The Master says: > 2011-12-23 15:45:40,564 WARN org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Got ConnectException > java.net.ConnectException: Connection refused > at java.net.PlainSocketImpl.socketConnect(Native Method) > at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) > at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195) > at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) > at java.net.Socket.connect(Socket.java:525) > at org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:624) > at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:408) > at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:630) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) > at org.apache.hadoop.mapred.Child$4.run(Child.java:259) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) > at org.apache.hadoop.mapred.Child.main(Child.java:253) > > > > Also, when I'm trying to run my own Job I see the following. All firewalls etc. should be shutdown. > > *** Master (node09.de): > 2011-12-23 15:57:47,140 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Connect attempt 0 of 10 max trying to connect to node09.de:22181 with poll msecs = 3000 > 2011-12-23 15:57:47,143 WARN org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Got ConnectException > java.net.ConnectException: Connection refused > at java.net.PlainSocketImpl.socketConnect(Native Method) > at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) > at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195) > at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) > at java.net.Socket.connect(Socket.java:525) > at org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:624) > at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:409) > at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:630) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) > at org.apache.hadoop.mapred.Child$4.run(Child.java:259) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) > at org.apache.hadoop.mapred.Child.main(Child.java:253) > > > > Thanks again. > Christoph These two exceptions on the master are also fine. It takes some time for the master to start the zk service (hence the multiple connection attempts).