hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das" <d...@yahoo-inc.com>
Subject RE: some reducers stock in copying stage
Date Wed, 28 Feb 2007 06:24:01 GMT
Looks like all hosts (from which map outputs haven't yet been fetched) are
classified as being "slow". That is because there were failures earlier
while fetching outputs from those. When failures happen (maybe due to
insufficient jetty server threads), there is a back-off for that host and
until the time in the back-off expires the outputs won't be fetched from
that particular host. The system should recover from this though. Another
thing you might want to try is to reduce the value of the
mapred.reduce.copy.backoff to a value like 5 (the number of seconds, by
default it is 300 seconds). This will ensure that the back-off is always
less than or equal to 1 min,5 secs (1 min is the minimum hardcoded backoff).

> -----Original Message-----
> From: Mike Smith [mailto:mike.smith.dev@gmail.com]
> Sent: Wednesday, February 28, 2007 8:45 AM
> To: hadoop-dev@lucene.apache.org
> Subject: some reducers stock in copying stage
> 
> After updating the hadoop trunk today, I am having problem at the reducing
> phase. Some of the reducers stock in the copying stage (very end of
> copying)
> and they keep reporting the same status, even when I kill the related
> tasktracker, the job traker still reports the copying. Here is the log:
> 
> 2007-02-27 22:08:26,388 INFO org.apache.hadoop.mapred.TaskRunner:
> task_0001_r_000224_0 Got 24 known map output location(s); scheduling...
> 2007-02-27 22:08:26,388 INFO org.apache.hadoop.mapred.TaskRunner:
> task_0001_r_000224_0 Scheduled 0 of 24 known outputs (24 slow hosts and 0
> dup hosts)
> 2007-02-27 22:08:27,204 INFO org.apache.hadoop.mapred.TaskTracker:
> task_0001_r_000224_0 0.33083335% reduce > copy (3176 of 3200 at 1.94 MB/s)
> >
> 2007-02-27 22:08:27,204 INFO org.apache.hadoop.mapred.TaskTracker:
> task_0001_r_000111_0 0.3321875% reduce > copy (3189 of 3200 at 0.40 MB/s)
> >
> 2007-02-27 22:08:28,214 INFO org.apache.hadoop.mapred.TaskTracker:
> task_0001_r_000224_0 0.33083335% reduce > copy (3176 of 3200 at 1.94 MB/s)
> >
> 2007-02-27 22:08:28,214 INFO org.apache.hadoop.mapred.TaskTracker:
> task_0001_r_000111_0 0.3321875% reduce > copy (3189 of 3200 at 0.40 MB/s)
> >
> 2007-02-27 22:08:29,224 INFO org.apache.hadoop.mapred.TaskTracker:
> task_0001_r_000224_0 0.33083335% reduce > copy (3176 of 3200 at 1.94 MB/s)
> >
> 2007-02-27 22:08:29,224 INFO org.apache.hadoop.mapred.TaskTracker:
> task_0001_r_000111_0 0.3321875% reduce > copy (3189 of 3200 at 0.40 MB/s)
> >
> 2007-02-27 22:08:30,114 INFO org.apache.hadoop.mapred.TaskRunner:
> task_0001_r_000111_0 Need 11 map output(s)
> 2007-02-27 22:08:30,114 INFO org.apache.hadoop.mapred.TaskRunner:
> task_0001_r_000111_0 Need 234 map output location(s)
> 2007-02-27 22:08:30,116 INFO org.apache.hadoop.mapred.TaskRunner:
> task_0001_r_000111_0 Got 0 new map outputs from jobtracker and 0 map
> outputs
> from previous failures
> 2007-02-27 22:08:30,116 INFO org.apache.hadoop.mapred.TaskRunner:
> task_0001_r_000111_0 Got 11 known map output location(s); scheduling...
> 2007-02-27 22:08:30,116 INFO org.apache.hadoop.mapred.TaskRunner:
> task_0001_r_000111_0 Scheduled 0 of 11 known outputs (11 slow hosts and 0
> dup hosts)
> 2007-02-27 22:08:30,234 INFO org.apache.hadoop.mapred.TaskTracker:
> task_0001_r_000224_0 0.33083335% reduce > copy (3176 of 3200 at 1.94 MB/s)
> >
> 2007-02-27 22:08:30,234 INFO org.apache.hadoop.mapred.TaskTracker:
> task_0001_r_000111_0 0.3321875% reduce > copy (3189 of 3200 at 0.40 MB/s)
> >
> 2007-02-27 22:08:31,244 INFO org.apache.hadoop.mapred.TaskTracker:
> task_0001_r_000224_0 0.33083335% reduce > copy (3176 of 3200 at 1.94 MB/s)
> >
> 2007-02-27 22:08:31,244 INFO org.apache.hadoop.mapred.TaskTracker:
> task_0001_r_000111_0 0.3321875% reduce > copy (3189 of 3200 at 0.40 MB/s)
> >
> 2007-02-27 22:08:31,394 INFO org.apache.hadoop.mapred.TaskRunner:
> task_0001_r_000224_0 Need 24 map output(s)
> 2007-02-27 22:08:31,394 INFO org.apache.hadoop.mapred.TaskRunner:
> task_0001_r_000224_0 Need 133 map output location(s)
> 2007-02-27 22:08:31,395 INFO org.apache.hadoop.mapred.TaskRunner:
> task_0001_r_000224_0 Got 0 new map outputs from jobtracker and 0 map
> outputs
> from previous failures
> 2007-02-27 22:08:31,395 INFO org.apache.hadoop.mapred.TaskRunner:
> task_0001_r_000224_0 Got 24 known map output location(s); scheduling...
> 2007-02-27 22:08:31,395 INFO org.apache.hadoop.mapred.TaskRunner:
> task_0001_r_000224_0 Scheduled 0 of 24 known outputs (24 slow hosts and 0
> dup hosts)
> 2007-02-27 22:08:32,254 INFO org.apache.hadoop.mapred.TaskTracker:
> task_0001_r_000224_0 0.33083335% reduce > copy (3176 of 3200 at 1.94 MB/s)
> >
> 2007-02-27 22:08:32,254 INFO org.apache.hadoop.mapred.TaskTracker:
> task_0001_r_000111_0 0.3321875% reduce > copy (3189 of 3200 at 0.40 MB/s)
> >
> 2007-02-27 22:08:33,264 INFO org.apache.hadoop.mapred.TaskTracker:
> task_0001_r_000224_0 0.33083335% reduce > copy (3176 of 3200 at 1.94 MB/s)
> >
> 2007-02-27 22:08:33,264 INFO org.apache.hadoop.mapred.TaskTracker:
> task_0001_r_000111_0 0.3321875% reduce > copy (3189 of 3200 at 0.40 MB/s)
> >
> 2007-02-27 22:08:34,274 INFO org.apache.hadoop.mapred.TaskTracker:
> task_0001_r_000224_0 0.33083335% reduce > copy (3176 of 3200 at 1.94 MB/s)
> >
> 2007-02-27 22:08:34,274 INFO org.apache.hadoop.mapred.TaskTracker:
> task_0001_r_000111_0 0.3321875% reduce > copy (3189 of 3200 at 0.40 MB/s)
> >
> 2007-02-27 22:08:35,124 INFO org.apache.hadoop.mapred.TaskRunner:
> task_0001_r_000111_0 Need 11 map output(s)
> 2007-02-27 22:08:35,124 INFO org.apache.hadoop.mapred.TaskRunner:
> task_0001_r_000111_0 Need 234 map output location(s)
> 2007-02-27 22:08:35,219 INFO org.apache.hadoop.mapred.TaskRunner:
> task_0001_r_000111_0 Got 0 new map outputs from jobtracker and 0 map
> outputs
> from previous failures
> 2007-02-27 22:08:35,219 INFO org.apache.hadoop.mapred.TaskRunner:
> task_0001_r_000111_0 Got 11 known map output location(s); scheduling...
> 2007-02-27 22:08:35,219 INFO org.apache.hadoop.mapred.TaskRunner:
> task_0001_r_000111_0 Scheduled 0 of 11 known outputs (11 slow hosts and 0
> dup hosts)
> 2007-02-27 22:08:35,284 INFO org.apache.hadoop.mapred.TaskTracker:
> task_0001_r_000224_0 0.33083335% reduce > copy (3176 of 3200 at 1.94 MB/s)
> >
> 2007-02-27 22:08:35,284 INFO org.apache.hadoop.mapred.TaskTracker:
> task_0001_r_000111_0 0.3321875% reduce > copy (3189 of 3200 at 0.40 MB/s)
> >
> 2007-02-27 22:08:36,294 INFO org.apache.hadoop.mapred.TaskTracker:
> task_0001_r_000224_0 0.33083335% reduce > copy (3176 of 3200 at 1.94 MB/s)
> >
> 2007-02-27 22:08:36,294 INFO org.apache.hadoop.mapred.TaskTracker:
> task_0001_r_000111_0 0.3321875% reduce > copy (3189 of 3200 at 0.40 MB/s)
> >
> 2007-02-27 22:08:36,404 INFO org.apache.hadoop.mapred.TaskRunner:
> task_0001_r_000224_0 Need 24 map output(s)
> 2007-02-27 22:08:36,404 INFO org.apache.hadoop.mapred.TaskRunner:
> task_0001_r_000224_0 Need 133 map output location(s)
> 2007-02-27 22:08:36,422 INFO org.apache.hadoop.mapred.TaskRunner:
> task_0001_r_000224_0 Got 0 new map outputs from jobtracker and 0 map
> outputs
> from previous


Mime
View raw message