Return-Path: X-Original-To: apmail-incubator-giraph-user-archive@minotaur.apache.org Delivered-To: apmail-incubator-giraph-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CA16A9C74 for ; Tue, 6 Mar 2012 07:53:06 +0000 (UTC) Received: (qmail 82450 invoked by uid 500); 6 Mar 2012 07:53:06 -0000 Delivered-To: apmail-incubator-giraph-user-archive@incubator.apache.org Received: (qmail 82137 invoked by uid 500); 6 Mar 2012 07:53:03 -0000 Mailing-List: contact giraph-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: giraph-user@incubator.apache.org Delivered-To: mailing list giraph-user@incubator.apache.org Received: (qmail 82111 invoked by uid 99); 6 Mar 2012 07:53:02 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Mar 2012 07:53:02 +0000 Received: from localhost (HELO achingmbp15.local) (127.0.0.1) (smtp-auth username aching, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Mar 2012 07:53:02 +0000 Message-ID: <4F55C27C.9000708@apache.org> Date: Mon, 05 Mar 2012 23:53:32 -0800 From: Avery Ching User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: giraph-user@incubator.apache.org Subject: Re: PageRankBenchmark failing with zooKeeper.KeeperException References: <4F559ABF.3060803@gmail.com> In-Reply-To: <4F559ABF.3060803@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi Abhishek, Nice to meet you. Can you try it with less workers? For instance -w 1 or -w 2? I think the likely issue is that you need have as many map slots as the number of workers + at least one master. If you don't have enough slots, the job will fail. Also, you might want to dial down the number of vertices a bit, unless you have oodles of memory. Please let us know if that helps. Avery On 3/5/12 9:03 PM, Abhishek Srivastava wrote: > Hi All, > > I have been trying (quite unsuccessfully for a while now) to run the > PageRankBenchmark > to play around with Giraph. I got hadoop running in a single node > setup and hadoop > jobs and jars run just fine. When I try to run the PageRankBenchmark, > I get this > incomprehensible error which I'm not able to diagnose. > > > > -----------------------------------CUT > HERE--------------------------------------------- > abhi@darkstar:trunk $ hadoop jar > target/giraph-0.70-jar-with-dependencies.jar > org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 50000000 > -w 30 > Warning: $HADOOP_HOME is deprecated. > > Using org.apache.giraph.benchmark.PageRankBenchmark$PageRankVertex > 12/03/04 03:44:08 WARN bsp.BspOutputFormat: checkOutputSpecs: > ImmutableOutputCommiter will not check anything > 12/03/04 03:44:09 INFO mapred.JobClient: Running job: > job_201203031851_0004 > 12/03/04 03:44:10 INFO mapred.JobClient: map 0% reduce 0% > 12/03/04 03:44:26 INFO mapred.JobClient: map 3% reduce 0% > 12/03/04 10:43:52 INFO mapred.JobClient: map 0% reduce 0% > 12/03/04 10:43:57 INFO mapred.JobClient: Task Id : > attempt_201203031851_0004_m_000000_0, Status : FAILED > Task attempt_201203031851_0004_m_000000_0 failed to report status for > 24979 seconds. Killing! > 12/03/04 10:44:00 INFO mapred.JobClient: Task Id : > attempt_201203031851_0004_m_000001_0, Status : FAILED > Task attempt_201203031851_0004_m_000001_0 failed to report status for > 25159 seconds. Killing! > 12/03/04 10:44:07 INFO mapred.JobClient: map 3% reduce 0% > 12/03/04 10:49:07 INFO mapred.JobClient: map 0% reduce 0% > 12/03/04 10:49:12 INFO mapred.JobClient: Task Id : > attempt_201203031851_0004_m_000000_1, Status : FAILED > java.lang.Throwable: Child Error > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) > Caused by: java.io.IOException: Task process exit with nonzero status > of 1. > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) > > 12/03/04 10:49:22 INFO mapred.JobClient: map 3% reduce 0% > 12/03/04 10:54:23 INFO mapred.JobClient: map 0% reduce 0% > 12/03/04 10:54:28 INFO mapred.JobClient: Task Id : > attempt_201203031851_0004_m_000000_2, Status : FAILED > java.lang.Throwable: Child Error > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) > Caused by: java.io.IOException: Task process exit with nonzero status > of 1. > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) > > 12/03/04 10:54:38 INFO mapred.JobClient: map 3% reduce 0% > 12/03/04 10:59:10 INFO mapred.JobClient: Task Id : > attempt_201203031851_0004_m_000001_1, Status : FAILED > java.lang.IllegalStateException: unregisterHealth: KeeperException - > Couldn't delete > /_hadoopBsp/job_201203031851_0004/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/darkstar_1 > at > org.apache.giraph.graph.BspServiceWorker.unregisterHealth(BspServiceWorker.java:727) > at > org.apache.giraph.graph.BspServiceWorker.failureCleanup(BspServiceWorker.java:735) > at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:648) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > Caused by: > org.apache.zookeeper.KeeperException$ConnectionLossException: > KeeperErrorCode = ConnectionLoss for > /_hadoopBsp/job_201203031851_0004/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/darkstar_1 > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:90) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:42) > at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:728) > at > org.apache.giraph.graph.BspServiceWorker.unregisterHealth(BspServiceWorker.java:721) > ... 9 more > > Task attempt_201203031851_0004_m_000001_1 failed to report status for > 601 seconds. Killing! > attempt_201203031851_0004_m_000001_1: log4j:WARN No appenders could be > found for logger (org.apache.zookeeper.ClientCnxn). > attempt_201203031851_0004_m_000001_1: log4j:WARN Please initialize the > log4j system properly. > 12/03/04 10:59:47 INFO mapred.JobClient: map 0% reduce 0% > 12/03/04 10:59:58 INFO mapred.JobClient: Job complete: > job_201203031851_0004 > 12/03/04 10:59:58 INFO mapred.JobClient: Counters: 6 > 12/03/04 10:59:58 INFO mapred.JobClient: Job Counters > 12/03/04 10:59:58 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=977551 > 12/03/04 10:59:58 INFO mapred.JobClient: Total time spent by all > reduces waiting after reserving slots (ms)=0 > 12/03/04 10:59:58 INFO mapred.JobClient: Total time spent by all > maps waiting after reserving slots (ms)=0 > 12/03/04 10:59:58 INFO mapred.JobClient: Launched map tasks=7 > 12/03/04 10:59:58 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 > 12/03/04 10:59:58 INFO mapred.JobClient: Failed map tasks=1 > -----------------------------------CUT > HERE--------------------------------------------- > > > Thanks, > Abhishek.