giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Young Han <young....@uwaterloo.ca>
Subject Giraph EC2 Map task fails
Date Sun, 24 Nov 2013 03:17:26 GMT
Hi,

We are attempting to get Giraph running on EC2, using Hadoop 1.0.4. We are
using page rank with the following command:

hadoop jar
$GIRAPH_HOME/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-1.0.2-jar-with-dependencies.jar
org.apache.giraph.GiraphRunner
org.apache.giraph.examples.SimplePageRankVertex -c
org.apache.giraph.combiner.DoubleSumCombiner -vif
org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
-vip /user/ubuntu/giraph-input/tiny_graph.txt -of
org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
/user/ubuntu/giraph-output/pagerank -w 1


The input graph is the sample graph provided on the website:

[0,0,[[1,1],[3,3]]]
[1,0,[[0,1],[2,2],[3,1]]]
[2,0,[[1,2],[4,4]]]
[3,0,[[0,3],[1,1],[4,4]]]
[4,0,[[3,4],[2,4]]]


We've tried small, medium, and xlarge instances; 4 instances and 3
instances; and various number of workers (-w 1, -w 2, -w 5, -w 10, etc.).
Hadoop has xmx (max Java heap size) set to 1024m.

The pattern is that the *first* map task will always fail. The error
appears in the Hadoop's jobtracker log:

2013-11-24 03:07:43,414 INFO org.apache.hadoop.mapred.JobInProgress:
job_201311240306_0001: nMaps=2 nReduces=0 max=-1
2013-11-24 03:07:43,417 INFO org.apache.hadoop.mapred.JobTracker: Job
job_201311240306_0001 added successfully for user
'ubuntu' to queue 'default'
2013-11-24 03:07:43,418 INFO org.apache.hadoop.mapred.JobTracker:
Initializing job_201311240306_0001
2013-11-24 03:07:43,419 INFO org.apache.hadoop.mapred.JobInProgress:
Initializing job_201311240306_0001
2013-11-24 03:07:43,422 INFO org.apache.hadoop.mapred.AuditLogger:
USER=ubuntu  IP=172.31.14.182        OPERATION=SUBMIT
_JOB    TARGET=job_201311240306_0001    RESULT=SUCCESS
2013-11-24 03:07:43,828 INFO org.apache.hadoop.mapred.JobInProgress:
jobToken generated and stored with users keys in /h
ome/ubuntu/hadoop_data/hadoop_tmp-ubuntu/mapred/system/job_201311240306_0001/jobToken
2013-11-24 03:07:43,846 INFO org.apache.hadoop.mapred.JobInProgress: Input
size for job job_201311240306_0001 = 0. Number of splits = 2
2013-11-24 03:07:43,846 INFO org.apache.hadoop.mapred.JobInProgress:
job_201311240306_0001 LOCALITY_WAIT_FACTOR=0.0
2013-11-24 03:07:43,847 INFO org.apache.hadoop.mapred.JobInProgress: Job
job_201311240306_0001 initialized successfully with 2 map tasks and 0
reduce tasks.
2013-11-24 03:07:45,152 INFO org.apache.hadoop.mapred.JobTracker: Adding
task (JOB_SETUP) 'attempt_201311240306_0001_m_000003_0' to tip
task_201311240306_0001_m_000003, for tracker 'tracker_cloud3:localhost/
127.0.0.1:47021'
2013-11-24 03:07:54,222 INFO org.apache.hadoop.mapred.JobInProgress: Task
'attempt_201311240306_0001_m_000003_0' has completed
task_201311240306_0001_m_000003 successfully.
2013-11-24 03:07:54,228 INFO org.apache.hadoop.mapred.JobInProgress:
Choosing a non-local task task_201311240306_0001_m_000000
2013-11-24 03:07:54,229 INFO org.apache.hadoop.mapred.JobTracker: Adding
task (MAP) 'attempt_201311240306_0001_m_000000_0' to tip
task_201311240306_0001_m_000000, for tracker 'tracker_cloud3:localhost/
127.0.0.1:47021'
2013-11-24 03:07:54,361 INFO org.apache.hadoop.mapred.JobInProgress:
Choosing a non-local task task_201311240306_0001_m_000001
2013-11-24 03:07:54,362 INFO org.apache.hadoop.mapred.JobTracker: Adding
task (MAP) 'attempt_201311240306_0001_m_000001_0' to tip
task_201311240306_0001_m_000001, for tracker 'tracker_cloud2:localhost/
127.0.0.1:55161'
2013-11-24 03:08:03,243 INFO org.apache.hadoop.mapred.TaskInProgress: Error
from attempt_201311240306_0001_m_000000_0: java.lang.Throwable: Child Error
        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
Caused by: java.io.IOException: Task process exit with nonzero status of 1.
        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)


Thereafter, all other workers will fail with:

2013-11-24 03:08:42,471 INFO org.apache.hadoop.mapred.TaskInProgress: Error
from attempt_201311240306_0001_m_000001_0: java.lang.IllegalStateException:
run: Caught an unrecoverable exception exists: Failed to check
/_hadoopBsp/job_201311240306_0001/_applicationAttemptsDir/0/_superstepDir/-1/_addressesAndPartitions
after 3 tries!
        at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:102)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.IllegalStateException: exists: Failed to check
/_hadoopBsp/job_201311240306_0001/_applicationAttemptsDir/0/_superstepDir/-1/_addressesAndPartitions
after 3 tries!
        at org.apache.giraph.zk.ZooKeeperExt.exists(ZooKeeperExt.java:369)
        at
org.apache.giraph.worker.BspServiceWorker.startSuperstep(BspServiceWorker.java:689)
        at
org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:488)
        at
org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:230)
        at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:92)
        ... 7 more


Any suggestions about why this might be happening?

Thanks,
Young

Mime
View raw message