giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Vesse <rve...@dotnetrdf.org>
Subject Re: Giraph EC2 Map task fails
Date Mon, 25 Nov 2013 09:42:03 GMT
I just reported a bug about that the other day - GIRAPH­797 for which
someone has proposed a patch and I believe has/will be committed soon so
should avoid this issue in future

Rob

From:  Young Han <young.han@uwaterloo.ca>
Reply-To:  <user@giraph.apache.org>
Date:  Sunday, 24 November 2013 19:19
To:  <user@giraph.apache.org>, <gsalazar@ime.usp.br>
Subject:  Re: Giraph EC2 Map task fails

> Actually, it turned out to be a dumber error than that... The name of the
> input file was wrong, so it was using an empty/non-existent graph.
> 
> We'll keep the zookeeper bit in mind if we run into further problems.
> 
> Thanks,
> Young
> 
> 
> On Sun, Nov 24, 2013 at 2:06 PM, Gustavo Enrique Salazar Torres
> <gsalazar@ime.usp.br> wrote:
>> I guess from your stacktrace that  you didn't start the zookeeper cluster.
>> 
>> Cheers
>> Gustavo
>> 
>> 
>> On Sunday, November 24, 2013, Young Han <young.han@uwaterloo.ca> wrote:
>>> > Hi,
>>> >
>>> > We are attempting to get Giraph running on EC2, using Hadoop 1.0.4. We are
>>> using page rank with the following command:
>>> >
>>> > hadoop jar 
>>> $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-1.0.2-j
>>> ar-with-dependencies.jar org.apache.giraph.GiraphRunner
>>> org.apache.giraph.examples.SimplePageRankVertex -c
>>> org.apache.giraph.combiner.DoubleSumCombiner -vif
>>> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip
>>> /user/ubuntu/giraph-input/tiny_graph.txt -of
>>> org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
>>> /user/ubuntu/giraph-output/pagerank -w 1
>>> >
>>> >
>>> > The input graph is the sample graph provided on the website:
>>> >
>>> > [0,0,[[1,1],[3,3]]]
>>> > [1,0,[[0,1],[2,2],[3,1]]]
>>> > [2,0,[[1,2],[4,4]]]
>>> > [3,0,[[0,3],[1,1],[4,4]]]
>>> > [4,0,[[3,4],[2,4]]]
>>> >
>>> >
>>> > We've tried small, medium, and xlarge instances; 4 instances and 3
>>> instances; and various number of workers (-w 1, -w 2, -w 5, -w 10, etc.).
>>> Hadoop has xmx (max Java heap size) set to 1024m.
>>> >
>>> > The pattern is that the *first* map task will always fail. The error
>>> appears in the Hadoop's jobtracker log:
>>> >
>>> > 2013-11-24 03:07:43,414 INFO org.apache.hadoop.mapred.JobInProgress:
>>> job_201311240306_0001: nMaps=2 nReduces=0 max=-1
>>> > 2013-11-24 03:07:43,417 INFO org.apache.hadoop.mapred.JobTracker: Job
>>> job_201311240306_0001 added successfully for user
>>> > 'ubuntu' to queue 'default'
>>> > 2013-11-24 03:07:43,418 INFO org.apache.hadoop.mapred.JobTracker:
>>> Initializing job_201311240306_0001
>>> > 2013-11-24 03:07:43,419 INFO org.apache.hadoop.mapred.JobInProgress:
>>> Initializing job_201311240306_0001
>>> > 2013-11-24 03:07:43,422 INFO org.apache.hadoop.mapred.AuditLogger:
>>> USER=ubuntu  IP=172.31.14.182        OPERATION=SUBMIT
>>> > _JOB    TARGET=job_201311240306_0001    RESULT=SUCCESS
>>> > 2013-11-24 03:07:43,828 INFO org.apache.hadoop.mapred.JobInProgress:
>>> jobToken generated and stored with users keys in /h
>>> > 
>>> ome/ubuntu/hadoop_data/hadoop_tmp-ubuntu/mapred/system/job_201311240306_0001
>>> /jobToken
>>> > 2013-11-24 03:07:43,846 INFO org.apache.hadoop.mapred.JobInProgress: Input
>>> size for job job_201311240306_0001 = 0. Number of splits = 2
>>> > 2013-11-24 03:07:43,846 INFO org.apache.hadoop.mapred.JobInProgress:
>>> job_201311240306_0001 LOCALITY_WAIT_FACTOR=0.0
>>> > 2013-11-24 03:07:43,847 INFO org.apache.hadoop.mapred.JobInProgress: Job
>>> job_201311240306_0001 initialized successfully with 2 map tasks and 0 reduce
>>> tasks.
>>> > 2013-11-24 03:07:45,152 INFO org.apache.hadoop.mapred.JobTracker: Adding
>>> task (JOB_SETUP) 'attempt_201311240306_0001_m_000003_0' to tip
>>> task_201311240306_0001_m_000003, for tracker
>>> 'tracker_cloud3:localhost/127.0.0.1:47021 <http://127.0.0.1:47021> '
>>> > 2013-11-24 03:07:54,222 INFO org.apache.hadoop.mapred.JobInProgress: Task
>>> 'attempt_201311240306_0001_m_000003_0' has completed
>>> task_201311240306_0001_m_000003 successfully.
>>> > 2013-11-24 03:07:54,228 INFO org.apache.hadoop.mapred.JobInProgress:
>>> Choosing a non-local task task_201311240306_0001_m_000000
>>> > 2013-11-24 03:07:54,229 INFO org.apache.hadoop.mapred.JobTracker: Adding
>>> task (MAP) 'attempt_201311240306_0001_m_000000_0' to tip
>>> task_201311240306_0001_m_000000, for tracker
>>> 'tracker_cloud3:localhost/127.0.0.1:47021 <http://127.0.0.1:47021> '
>>> > 2013-11-24 03:07:54,361 INFO org.apache.hadoop.mapred.JobInProgress:
>>> Choosing a non-local task task_201311240306_0001_m_000001
>>> > 2013-11-24 03:07:54,362 INFO org.apache.hadoop.mapred.JobTracker: Adding
>>> task (MAP) 'attempt_201311240306_0001_m_000001_0' to tip
>>> task_201311240306_0001_m_000001, for tracker
>>> 'tracker_cloud2:localhost/127.0.0.1:55161 <http://127.0.0.1:55161> '
>>> > 2013-11-24 03:08:03,243 INFO org.apache.hadoop.mapred.TaskInProgress:
>>> Error from attempt_201311240306_0001_m_000000_0: java.lang.Throwable: Child
>>> Error
>>> >         at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
>>> > Caused by: java.io.IOException: Task process exit with nonzero status of
>>> 1.
>>> >         at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)
>>> >
>>> >
>>> > Thereafter, all other workers will fail with:
>>> >
>>> > 2013-11-24 03:08:42,471 INFO org.apache.hadoop.mapred.TaskInProgress:
>>> Error from attempt_201311240306_0001_m_000001_0:
>>> java.lang.IllegalStateException: run: Caught an unrecoverable exception
>>> exists: Failed to check
>>> /_hadoopBsp/job_201311240306_0001/_applicationAttemptsDir/0/_superstepDir/-1
>>> /_addressesAndPartitions after 3 tries!
>>> >         at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:102)
>>> >         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>>> >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>>> >         at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>>> >         at java.security.AccessController.doPrivileged(Native Method)
>>> >         at javax.security.auth.Subject.doAs(Subject.java:396)
>>> >         at 
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
>>> va:1121)
>>> >         at org.apache.hadoop.mapred.Child.main(Child.java:249)
>>> > Caused by: java.lang.IllegalStateException: exists: Failed to check
>>> /_hadoopBsp/job_201311240306_0001/_applicationAttemptsDir/0/_superstepDir/-1
>>> /_addressesAndPartitions after 3 tries!
>>> >         at org.apache.giraph.zk.ZooKeeperExt.exists(ZooKeeperExt.java:369)
>>> >         at 
>>> org.apache.giraph.worker.BspServiceWorker.startSuperstep(BspServiceWorker.ja
>>> va:689)
>>> >         at 
>>> org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:488)
>>> >         at 
>>> org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:230)
>>> >         at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:92)
>>> >         ... 7 more
>>> >
>>> >
>>> > Any suggestions about why this might be happening?
>>> >
>>> > Thanks,
>>> > Young
>>> > 
> 



Mime
View raw message