hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mori Bellamy <mbell...@apple.com>
Subject Re: scaling issue, please help
Date Wed, 02 Jul 2008 18:04:05 GMT
i discovered that some of my code was causing out of bounds  
exceptions. i cleaned up that code and the map tasks seemed to work.  
that confuses me -- i'm pretty sure hadoop is resilient to a few map  
tasks failing (5 out of 13k). before this fix, my remaining 2% of  
tasks were getting killed.


On Jul 1, 2008, at 10:06 PM, Amar Kamat wrote:

> Mori Bellamy wrote:
>> hey all,
>> i've got a mapreduce task that works on small (~1G) input. when i  
>> try to run the same task on large (~100G) input, i get the  
>> following error around when the map tasks are almost done (~98%)
>>
>> 2008-07-01 13:10:59,231 INFO org.apache.hadoop.mapred.ReduceTask:  
>> task_200807011005_0005_r_000000_0: Got 0 new map-outputs & 0  
>> obsolete map-outputs from tasktracker and 0 map-outputs from  
>> previous failures
>> 2008-07-01 13:10:59,232 INFO org.apache.hadoop.mapred.ReduceTask:  
>> task_200807011005_0005_r_000000_0 Got 0 known map output  
>> location(s); scheduling...
>> 2008-07-01 13:10:59,232 INFO org.apache.hadoop.mapred.ReduceTask:  
>> task_200807011005_0005_r_000000_0 Scheduled 0 of 0 known outputs (0  
>> slow hosts and 0 dup hosts)
>> 2008-07-01 13:10:59,232 INFO org.apache.hadoop.mapred.ReduceTask:  
>> task_200807011005_0005_r_000000_0 Need 1 map output(s)
> ...
> ...
> These are not error messages. The reducers are stuck as not all maps  
> are completed. Mori, could you let us know what is happening to the  
> other 2% maps. Are they getting executed? Are they still pending  
> (waiting to run)? Were they killed/failed? Is there any lost tracker?
>> I'm running the task on a cluster of 5 workers, one DFS master, and  
>> one task tracker.
> What do you mean by 5 workers and 1 task tracker?
>> i'm chaining mapreduce tasks, so i'm using SequenceFileOutput and  
>> SequenceFileInput. this error happens before the first link in the  
>> chain sucessfully reduces.
> Can you elaborate this a bit. Are you chaining MR jobs?
> Amar
>>
>> does anyone have any insight? thanks!
>


Mime
View raw message