hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amar Kamat <ama...@yahoo-inc.com>
Subject Re: scaling issue, please help
Date Wed, 02 Jul 2008 05:06:28 GMT
Mori Bellamy wrote:
> hey all,
> i've got a mapreduce task that works on small (~1G) input. when i try 
> to run the same task on large (~100G) input, i get the following error 
> around when the map tasks are almost done (~98%)
>
> 2008-07-01 13:10:59,231 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200807011005_0005_r_000000_0: Got 0 new map-outputs & 0 obsolete 
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-07-01 13:10:59,232 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200807011005_0005_r_000000_0 Got 0 known map output location(s); 
> scheduling...
> 2008-07-01 13:10:59,232 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200807011005_0005_r_000000_0 Scheduled 0 of 0 known outputs (0 
> slow hosts and 0 dup hosts)
> 2008-07-01 13:10:59,232 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200807011005_0005_r_000000_0 Need 1 map output(s)
...
...
These are not error messages. The reducers are stuck as not all maps are 
completed. Mori, could you let us know what is happening to the other 2% 
maps. Are they getting executed? Are they still pending (waiting to 
run)? Were they killed/failed? Is there any lost tracker?
> I'm running the task on a cluster of 5 workers, one DFS master, and 
> one task tracker.
What do you mean by 5 workers and 1 task tracker?
> i'm chaining mapreduce tasks, so i'm using SequenceFileOutput and 
> SequenceFileInput. this error happens before the first link in the 
> chain sucessfully reduces.
Can you elaborate this a bit. Are you chaining MR jobs?
Amar
>
> does anyone have any insight? thanks!


Mime
View raw message