Return-Path: X-Original-To: apmail-giraph-user-archive@www.apache.org Delivered-To: apmail-giraph-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3E0A910A93 for ; Mon, 25 Nov 2013 09:43:31 +0000 (UTC) Received: (qmail 65033 invoked by uid 500); 25 Nov 2013 09:43:29 -0000 Delivered-To: apmail-giraph-user-archive@giraph.apache.org Received: (qmail 64979 invoked by uid 500); 25 Nov 2013 09:43:23 -0000 Mailing-List: contact user-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@giraph.apache.org Delivered-To: mailing list user@giraph.apache.org Received: (qmail 64968 invoked by uid 99); 25 Nov 2013 09:43:21 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Nov 2013 09:43:21 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,MIME_QP_LONG_LINE,NORMAL_HTTP_TO_IP,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [64.79.170.100] (HELO mc.internetmailserver.net) (64.79.170.100) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Nov 2013 09:43:14 +0000 Received: from smg01.internetmailserver.net (localhost [127.0.0.1]) by mc.internetmailserver.net (Postfix) with ESMTP id 7A88F61F70 for ; Mon, 25 Nov 2013 01:42:41 -0800 (PST) X-Sender-Id: rvesse@dotnetrdf.org Received: from smg01.internetmailserver.net (smg01.internetmailserver.net [64.79.170.150]) by 0.0.0.0:2501 (trex/4.8.90); Mon, 25 Nov 2013 09:42:41 GMT X-MC-Relay: Neutral Received: from sm06.internetmailserver.net (sm06.dotnetplayground.com [192.168.120.26]) by smg01.internetmailserver.net with SMTP; Mon, 25 Nov 2013 01:42:49 -0800 Received: from [192.168.1.65] (host31-50-86-94.range31-50.btcentralplus.com [31.50.86.94]) by sm06.internetmailserver.net with SMTP; Mon, 25 Nov 2013 01:42:27 -0800 User-Agent: Microsoft-MacOutlook/14.3.8.130913 Date: Mon, 25 Nov 2013 09:42:03 +0000 Subject: Re: Giraph EC2 Map task fails From: Rob Vesse To: Message-ID: Thread-Topic: Giraph EC2 Map task fails In-Reply-To: Mime-version: 1.0 Content-type: multipart/alternative; boundary="B_3468217328_106755" X-Virus-Checked: Checked by ClamAV on apache.org > This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. --B_3468217328_106755 Content-type: text/plain; charset="ISO-8859-1" Content-transfer-encoding: quoted-printable I just reported a bug about that the other day - GIRAPH=AD797 for which someone has proposed a patch and I believe has/will be committed soon so should avoid this issue in future Rob From: Young Han Reply-To: Date: Sunday, 24 November 2013 19:19 To: , Subject: Re: Giraph EC2 Map task fails > Actually, it turned out to be a dumber error than that... The name of the > input file was wrong, so it was using an empty/non-existent graph. >=20 > We'll keep the zookeeper bit in mind if we run into further problems. >=20 > Thanks, > Young >=20 >=20 > On Sun, Nov 24, 2013 at 2:06 PM, Gustavo Enrique Salazar Torres > wrote: >> I guess from your stacktrace that you didn't start the zookeeper cluste= r. >>=20 >> Cheers >> Gustavo >>=20 >>=20 >> On Sunday, November 24, 2013, Young Han wrote: >>> > Hi, >>> > >>> > We are attempting to get Giraph running on EC2, using Hadoop 1.0.4. W= e are >>> using page rank with the following command: >>> > >>> > hadoop jar=20 >>> $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-1.= 0.2-j >>> ar-with-dependencies.jar org.apache.giraph.GiraphRunner >>> org.apache.giraph.examples.SimplePageRankVertex -c >>> org.apache.giraph.combiner.DoubleSumCombiner -vif >>> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat= -vip >>> /user/ubuntu/giraph-input/tiny_graph.txt -of >>> org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op >>> /user/ubuntu/giraph-output/pagerank -w 1 >>> > >>> > >>> > The input graph is the sample graph provided on the website: >>> > >>> > [0,0,[[1,1],[3,3]]] >>> > [1,0,[[0,1],[2,2],[3,1]]] >>> > [2,0,[[1,2],[4,4]]] >>> > [3,0,[[0,3],[1,1],[4,4]]] >>> > [4,0,[[3,4],[2,4]]] >>> > >>> > >>> > We've tried small, medium, and xlarge instances; 4 instances and 3 >>> instances; and various number of workers (-w 1, -w 2, -w 5, -w 10, etc.= ). >>> Hadoop has xmx (max Java heap size) set to 1024m. >>> > >>> > The pattern is that the *first* map task will always fail. The error >>> appears in the Hadoop's jobtracker log: >>> > >>> > 2013-11-24 03:07:43,414 INFO org.apache.hadoop.mapred.JobInProgress: >>> job_201311240306_0001: nMaps=3D2 nReduces=3D0 max=3D-1 >>> > 2013-11-24 03:07:43,417 INFO org.apache.hadoop.mapred.JobTracker: Job >>> job_201311240306_0001 added successfully for user >>> > 'ubuntu' to queue 'default' >>> > 2013-11-24 03:07:43,418 INFO org.apache.hadoop.mapred.JobTracker: >>> Initializing job_201311240306_0001 >>> > 2013-11-24 03:07:43,419 INFO org.apache.hadoop.mapred.JobInProgress: >>> Initializing job_201311240306_0001 >>> > 2013-11-24 03:07:43,422 INFO org.apache.hadoop.mapred.AuditLogger: >>> USER=3Dubuntu IP=3D172.31.14.182 OPERATION=3DSUBMIT >>> > _JOB TARGET=3Djob_201311240306_0001 RESULT=3DSUCCESS >>> > 2013-11-24 03:07:43,828 INFO org.apache.hadoop.mapred.JobInProgress: >>> jobToken generated and stored with users keys in /h >>> >=20 >>> ome/ubuntu/hadoop_data/hadoop_tmp-ubuntu/mapred/system/job_201311240306= _0001 >>> /jobToken >>> > 2013-11-24 03:07:43,846 INFO org.apache.hadoop.mapred.JobInProgress: = Input >>> size for job job_201311240306_0001 =3D 0. Number of splits =3D 2 >>> > 2013-11-24 03:07:43,846 INFO org.apache.hadoop.mapred.JobInProgress: >>> job_201311240306_0001 LOCALITY_WAIT_FACTOR=3D0.0 >>> > 2013-11-24 03:07:43,847 INFO org.apache.hadoop.mapred.JobInProgress: = Job >>> job_201311240306_0001 initialized successfully with 2 map tasks and 0 r= educe >>> tasks. >>> > 2013-11-24 03:07:45,152 INFO org.apache.hadoop.mapred.JobTracker: Add= ing >>> task (JOB_SETUP) 'attempt_201311240306_0001_m_000003_0' to tip >>> task_201311240306_0001_m_000003, for tracker >>> 'tracker_cloud3:localhost/127.0.0.1:47021 ' >>> > 2013-11-24 03:07:54,222 INFO org.apache.hadoop.mapred.JobInProgress: = Task >>> 'attempt_201311240306_0001_m_000003_0' has completed >>> task_201311240306_0001_m_000003 successfully. >>> > 2013-11-24 03:07:54,228 INFO org.apache.hadoop.mapred.JobInProgress: >>> Choosing a non-local task task_201311240306_0001_m_000000 >>> > 2013-11-24 03:07:54,229 INFO org.apache.hadoop.mapred.JobTracker: Add= ing >>> task (MAP) 'attempt_201311240306_0001_m_000000_0' to tip >>> task_201311240306_0001_m_000000, for tracker >>> 'tracker_cloud3:localhost/127.0.0.1:47021 ' >>> > 2013-11-24 03:07:54,361 INFO org.apache.hadoop.mapred.JobInProgress: >>> Choosing a non-local task task_201311240306_0001_m_000001 >>> > 2013-11-24 03:07:54,362 INFO org.apache.hadoop.mapred.JobTracker: Add= ing >>> task (MAP) 'attempt_201311240306_0001_m_000001_0' to tip >>> task_201311240306_0001_m_000001, for tracker >>> 'tracker_cloud2:localhost/127.0.0.1:55161 ' >>> > 2013-11-24 03:08:03,243 INFO org.apache.hadoop.mapred.TaskInProgress: >>> Error from attempt_201311240306_0001_m_000000_0: java.lang.Throwable: C= hild >>> Error >>> > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:27= 1) >>> > Caused by: java.io.IOException: Task process exit with nonzero status= of >>> 1. >>> > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:25= 8) >>> > >>> > >>> > Thereafter, all other workers will fail with: >>> > >>> > 2013-11-24 03:08:42,471 INFO org.apache.hadoop.mapred.TaskInProgress: >>> Error from attempt_201311240306_0001_m_000001_0: >>> java.lang.IllegalStateException: run: Caught an unrecoverable exception >>> exists: Failed to check >>> /_hadoopBsp/job_201311240306_0001/_applicationAttemptsDir/0/_superstepD= ir/-1 >>> /_addressesAndPartitions after 3 tries! >>> > at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:1= 02) >>> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java= :764) >>> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) >>> > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) >>> > at java.security.AccessController.doPrivileged(Native Method) >>> > at javax.security.auth.Subject.doAs(Subject.java:396) >>> > at=20 >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformati= on.ja >>> va:1121) >>> > at org.apache.hadoop.mapred.Child.main(Child.java:249) >>> > Caused by: java.lang.IllegalStateException: exists: Failed to check >>> /_hadoopBsp/job_201311240306_0001/_applicationAttemptsDir/0/_superstepD= ir/-1 >>> /_addressesAndPartitions after 3 tries! >>> > at org.apache.giraph.zk.ZooKeeperExt.exists(ZooKeeperExt.java= :369) >>> > at=20 >>> org.apache.giraph.worker.BspServiceWorker.startSuperstep(BspServiceWork= er.ja >>> va:689) >>> > at=20 >>> org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:4= 88) >>> > at=20 >>> org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:= 230) >>> > at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:9= 2) >>> > ... 7 more >>> > >>> > >>> > Any suggestions about why this might be happening? >>> > >>> > Thanks, >>> > Young >>> >=20 >=20 --B_3468217328_106755 Content-type: text/html; charset="ISO-8859-1" Content-transfer-encoding: quoted-printable
I just reported a bug about = that the other day - GIRAPH–797 for which someone has proposed a patch= and I believe has/will be committed soon so should avoid this issue in futu= re

Rob

From: Young Han <youn= g.han@uwaterloo.ca>
Reply-To: <user@giraph.apache.org>= ;
Date: Sunday, 24 November 2013 1= 9:19
To: <user@giraph.apache.org>, <gsalazar@ime.usp.br>
Subject: Re: Giraph EC2 Map task fails

Actually, it= turned out to be a dumber error than that... The name of the input file was= wrong, so it was using an empty/non-existent graph.

We'll keep the z= ookeeper bit in mind if we run into further problems.

Thanks,
Young


O= n Sun, Nov 24, 2013 at 2:06 PM, Gustavo Enrique Salazar Torres <gsalazar@ime.usp= .br> wrote:
I guess from your sta= cktrace that  you didn't start the zookeeper cluster.

Cheers
Gustavo


On Sunday,= November 24, 2013, Young Han <young.han@uwaterloo.ca> wrote:
> Hi,
>
> We are attempting to get Giraph running on EC2, us= ing Hadoop 1.0.4. We are using page rank with the following command:
>=
> hadoop jar $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.0.= 0-for-hadoop-1.0.2-jar-with-dependencies.jar org.apache.giraph.GiraphRunner = org.apache.giraph.examples.SimplePageRankVertex -c org.apache.giraph.combine= r.DoubleSumCombiner -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDou= bleVertexInputFormat -vip /user/ubuntu/giraph-input/tiny_graph.txt -of org.a= pache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/ubuntu/giraph-= output/pagerank -w 1
>
>
> The input graph is the sample graph provided on the we= bsite:
>
> [0,0,[[1,1],[3,3]]]
> [1,0,[[0,1],[2,2],[3,1]]]=
> [2,0,[[1,2],[4,4]]]
> [3,0,[[0,3],[1,1],[4,4]]]
> [4,0,= [[3,4],[2,4]]]
>
>
> We've tried small, medium, and xlarge instances; 4 ins= tances and 3 instances; and various number of workers (-w 1, -w 2, -w 5, -w = 10, etc.). Hadoop has xmx (max Java heap size) set to 1024m.
>
> The pattern is that the *first* map task will always fail. The error a= ppears in the Hadoop's jobtracker log:
>
> 2013-11-24 03:07:43,4= 14 INFO org.apache.hadoop.mapred.JobInProgress: job_201311240306_0001: nMaps= =3D2 nReduces=3D0 max=3D-1
> 2013-11-24 03:07:43,417 INFO org.apache.hadoop.mapred.JobTracker: Job = job_201311240306_0001 added successfully for user
> 'ubuntu' to queue = 'default'
> 2013-11-24 03:07:43,418 INFO org.apache.hadoop.mapred.JobT= racker: Initializing job_201311240306_0001
> 2013-11-24 03:07:43,419 INFO org.apache.hadoop.mapred.JobInProgress: I= nitializing job_201311240306_0001
> 2013-11-24 03:07:43,422 INFO org.a= pache.hadoop.mapred.AuditLogger: USER=3Dubuntu  IP=3D172.31.14.182 &nb= sp;      OPERATION=3DSUBMIT
> _JOB    TARGET=3Djob_201311240306_0001    = RESULT=3DSUCCESS
> 2013-11-24 03:07:43,828 INFO org.apache.hadoop.mapred= .JobInProgress: jobToken generated and stored with users keys in /h
> = ome/ubuntu/hadoop_data/hadoop_tmp-ubuntu/mapred/system/job_201311240306_0001= /jobToken
> 2013-11-24 03:07:43,846 INFO org.apache.hadoop.mapred.JobInProgress: I= nput size for job job_201311240306_0001 =3D 0. Number of splits =3D 2
> 20= 13-11-24 03:07:43,846 INFO org.apache.hadoop.mapred.JobInProgress: job_20131= 1240306_0001 LOCALITY_WAIT_FACTOR=3D0.0
> 2013-11-24 03:07:43,847 INFO org.apache.hadoop.mapred.JobInProgress: J= ob job_201311240306_0001 initialized successfully with 2 map tasks and 0 red= uce tasks.
> 2013-11-24 03:07:45,152 INFO org.apache.hadoop.mapred.Job= Tracker: Adding task (JOB_SETUP) 'attempt_201311240306_0001_m_000003_0' to t= ip task_201311240306_0001_m_000003, for tracker 'tracker_cloud3:localhost/127.0.0.1:47021'
> 2013-11-24 03:07:54,222 INFO org.apache.hadoop.mapred.JobInProgress: T= ask 'attempt_201311240306_0001_m_000003_0' has completed task_201311240306_0= 001_m_000003 successfully.
> 2013-11-24 03:07:54,228 INFO org.apache.h= adoop.mapred.JobInProgress: Choosing a non-local task task_201311240306_0001= _m_000000
> 2013-11-24 03:07:54,229 INFO org.apache.hadoop.mapred.JobTracker: Addi= ng task (MAP) 'attempt_201311240306_0001_m_000000_0' to tip task_20131124030= 6_0001_m_000000, for tracker 'tracker_cloud3:localhost/127.0.0.1:47021'
> 2013-11-24 03:07:54,361 INFO org.apache.hadoop.mapred.JobInProgress: C= hoosing a non-local task task_201311240306_0001_m_000001
> 2013-11-24 = 03:07:54,362 INFO org.apache.hadoop.mapred.JobTracker: Adding task (MAP) 'at= tempt_201311240306_0001_m_000001_0' to tip task_201311240306_0001_m_000001, = for tracker 'tracker_cloud2:localhost/127.0.0.1:55161'
> 2013-11-24 03:08:03,243 INFO org.apache.hadoop.mapred.TaskInProgress: = Error from attempt_201311240306_0001_m_000000_0: java.lang.Throwable: Child = Error
>         at org.apache.hadoo= p.mapred.TaskRunner.run(TaskRunner.java:271)
> Caused by: java.io.IOException: Task process exit with nonzero status = of 1.
>         at org.apache.hadoo= p.mapred.TaskRunner.run(TaskRunner.java:258)
>
>
> Thereaf= ter, all other workers will fail with:
>
> 2013-11-24 03:08:42,471 INFO org.apache.hadoop.mapred.TaskInPr= ogress: Error from attempt_201311240306_0001_m_000001_0: java.lang.IllegalSt= ateException: run: Caught an unrecoverable exception exists: Failed to check= /_hadoopBsp/job_201311240306_0001/_applicationAttemptsDir/0/_superstepDir/-= 1/_addressesAndPartitions after 3 tries!
>         at org.apache.giraph.graph.= GraphMapper.run(GraphMapper.java:102)
>      =    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:7= 64)
>         at org.apache.hadoop.= mapred.MapTask.run(MapTask.java:370)
>         at org.apache.hadoop.mapred= .Child$4.run(Child.java:255)
>       &nb= sp; at java.security.AccessController.doPrivileged(Native Method)
> &n= bsp;       at javax.security.auth.Subject.doAs= (Subject.java:396)
>         at org.apache.hadoop.securi= ty.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>         at org.apache.hadoop.mapred= .Child.main(Child.java:249)
> Caused by: java.lang.IllegalStateExcepti= on: exists: Failed to check /_hadoopBsp/job_201311240306_0001/_applicationAt= temptsDir/0/_superstepDir/-1/_addressesAndPartitions after 3 tries!
>         at org.apache.giraph.zk.Zoo= KeeperExt.exists(ZooKeeperExt.java:369)
>     &nbs= p;   at org.apache.giraph.worker.BspServiceWorker.startSuperstep(B= spServiceWorker.java:689)
>        = at org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:48= 8)
>         at org.apache.giraph.graph.= GraphTaskManager.execute(GraphTaskManager.java:230)
>   &nbs= p;     at org.apache.giraph.graph.GraphMapper.run(GraphM= apper.java:92)
>         ... 7 more=
>
>
> Any suggestions about why this might be happening?<= br> >
> Thanks,
> Young
>

--B_3468217328_106755--