hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mori Bellamy <mbell...@apple.com>
Subject scaling issue, please help
Date Tue, 01 Jul 2008 22:20:03 GMT
hey all,
i've got a mapreduce task that works on small (~1G) input. when i try  
to run the same task on large (~100G) input, i get the following error  
around when the map tasks are almost done (~98%)

2008-07-01 13:10:59,231 INFO org.apache.hadoop.mapred.ReduceTask:  
task_200807011005_0005_r_000000_0: Got 0 new map-outputs & 0 obsolete  
map-outputs from tasktracker and 0 map-outputs from previous failures
2008-07-01 13:10:59,232 INFO org.apache.hadoop.mapred.ReduceTask:  
task_200807011005_0005_r_000000_0 Got 0 known map output location(s);  
scheduling...
2008-07-01 13:10:59,232 INFO org.apache.hadoop.mapred.ReduceTask:  
task_200807011005_0005_r_000000_0 Scheduled 0 of 0 known outputs (0  
slow hosts and 0 dup hosts)
2008-07-01 13:10:59,232 INFO org.apache.hadoop.mapred.ReduceTask:  
task_200807011005_0005_r_000000_0 Need 1 map output(s)
2008-07-01 13:11:00,231 INFO org.apache.hadoop.mapred.ReduceTask:  
task_200807011005_0005_r_000000_0: Got 0 new map-outputs & 0 obsolete  
map-outputs from tasktracker and 0 map-outputs from previous failures
2008-07-01 13:11:00,231 INFO org.apache.hadoop.mapred.ReduceTask:  
task_200807011005_0005_r_000000_0 Got 0 known map output location(s);  
scheduling...
2008-07-01 13:11:00,231 INFO org.apache.hadoop.mapred.ReduceTask:  
task_200807011005_0005_r_000000_0 Scheduled 0 of 0 known outputs (0  
slow hosts and 0 dup hosts)
2008-07-01 13:11:05,232 INFO org.apache.hadoop.mapred.ReduceTask:  
task_200807011005_0005_r_000000_0 Need 1 map output(s)
2008-07-01 13:11:05,232 INFO org.apache.hadoop.mapred.ReduceTask:  
task_200807011005_0005_r_000000_0: Got 0 new map-outputs & 0 obsolete  
map-outputs from tasktracker and 0 map-outputs from previous failures
2008-07-01 13:11:05,232 INFO org.apache.hadoop.mapred.ReduceTask:  
task_200807011005_0005_r_000000_0 Got 0 known map output location(s);  
scheduling...
2008-07-01 13:11:05,233 INFO org.apache.hadoop.mapred.ReduceTask:  
task_200807011005_0005_r_000000_0 Scheduled 0 of 0 known outputs (0  
slow hosts and 0 dup hosts)
2008-07-01 13:11:10,233 INFO org.apache.hadoop.mapred.ReduceTask:  
task_200807011005_0005_r_000000_0 Need 1 map output(s)

I'm running the task on a cluster of 5 workers, one DFS master, and  
one task tracker. i'm chaining mapreduce tasks, so i'm using  
SequenceFileOutput and SequenceFileInput. this error happens before  
the first link in the chain sucessfully reduces.

does anyone have any insight? thanks!
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message