hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das" <d...@yahoo-inc.com>
Subject RE: some reducers stock in copying stage
Date Wed, 28 Feb 2007 06:43:51 GMT
Mike,
The patches for h-1042 and h-1043 should address your situation better. They
have not been committed as yet. Please apply the patches manually and see
whether the situation improves.
Thanks,
Devaraj.

> -----Original Message-----
> From: Devaraj Das [mailto:ddas@yahoo-inc.com]
> Sent: Wednesday, February 28, 2007 11:54 AM
> To: 'hadoop-dev@lucene.apache.org'
> Subject: RE: some reducers stock in copying stage
> 
> Looks like all hosts (from which map outputs haven't yet been fetched) are
> classified as being "slow". That is because there were failures earlier
> while fetching outputs from those. When failures happen (maybe due to
> insufficient jetty server threads), there is a back-off for that host and
> until the time in the back-off expires the outputs won't be fetched from
> that particular host. The system should recover from this though. Another
> thing you might want to try is to reduce the value of the
> mapred.reduce.copy.backoff to a value like 5 (the number of seconds, by
> default it is 300 seconds). This will ensure that the back-off is always
> less than or equal to 1 min,5 secs (1 min is the minimum hardcoded
> backoff).
> 
> > -----Original Message-----
> > From: Mike Smith [mailto:mike.smith.dev@gmail.com]
> > Sent: Wednesday, February 28, 2007 8:45 AM
> > To: hadoop-dev@lucene.apache.org
> > Subject: some reducers stock in copying stage
> >
> > After updating the hadoop trunk today, I am having problem at the
> reducing
> > phase. Some of the reducers stock in the copying stage (very end of
> > copying)
> > and they keep reporting the same status, even when I kill the related
> > tasktracker, the job traker still reports the copying. Here is the log:
> >
> > 2007-02-27 22:08:26,388 INFO org.apache.hadoop.mapred.TaskRunner:
> > task_0001_r_000224_0 Got 24 known map output location(s); scheduling...
> > 2007-02-27 22:08:26,388 INFO org.apache.hadoop.mapred.TaskRunner:
> > task_0001_r_000224_0 Scheduled 0 of 24 known outputs (24 slow hosts and
> 0
> > dup hosts)
> > 2007-02-27 22:08:27,204 INFO org.apache.hadoop.mapred.TaskTracker:
> > task_0001_r_000224_0 0.33083335% reduce > copy (3176 of 3200 at 1.94
> MB/s)
> > >
> > 2007-02-27 22:08:27,204 INFO org.apache.hadoop.mapred.TaskTracker:
> > task_0001_r_000111_0 0.3321875% reduce > copy (3189 of 3200 at 0.40
> MB/s)
> > >
> > 2007-02-27 22:08:28,214 INFO org.apache.hadoop.mapred.TaskTracker:
> > task_0001_r_000224_0 0.33083335% reduce > copy (3176 of 3200 at 1.94
> MB/s)
> > >
> > 2007-02-27 22:08:28,214 INFO org.apache.hadoop.mapred.TaskTracker:
> > task_0001_r_000111_0 0.3321875% reduce > copy (3189 of 3200 at 0.40
> MB/s)
> > >
> > 2007-02-27 22:08:29,224 INFO org.apache.hadoop.mapred.TaskTracker:
> > task_0001_r_000224_0 0.33083335% reduce > copy (3176 of 3200 at 1.94
> MB/s)
> > >
> > 2007-02-27 22:08:29,224 INFO org.apache.hadoop.mapred.TaskTracker:
> > task_0001_r_000111_0 0.3321875% reduce > copy (3189 of 3200 at 0.40
> MB/s)
> > >
> > 2007-02-27 22:08:30,114 INFO org.apache.hadoop.mapred.TaskRunner:
> > task_0001_r_000111_0 Need 11 map output(s)
> > 2007-02-27 22:08:30,114 INFO org.apache.hadoop.mapred.TaskRunner:
> > task_0001_r_000111_0 Need 234 map output location(s)
> > 2007-02-27 22:08:30,116 INFO org.apache.hadoop.mapred.TaskRunner:
> > task_0001_r_000111_0 Got 0 new map outputs from jobtracker and 0 map
> > outputs
> > from previous failures
> > 2007-02-27 22:08:30,116 INFO org.apache.hadoop.mapred.TaskRunner:
> > task_0001_r_000111_0 Got 11 known map output location(s); scheduling...
> > 2007-02-27 22:08:30,116 INFO org.apache.hadoop.mapred.TaskRunner:
> > task_0001_r_000111_0 Scheduled 0 of 11 known outputs (11 slow hosts and
> 0
> > dup hosts)
> > 2007-02-27 22:08:30,234 INFO org.apache.hadoop.mapred.TaskTracker:
> > task_0001_r_000224_0 0.33083335% reduce > copy (3176 of 3200 at 1.94
> MB/s)
> > >
> > 2007-02-27 22:08:30,234 INFO org.apache.hadoop.mapred.TaskTracker:
> > task_0001_r_000111_0 0.3321875% reduce > copy (3189 of 3200 at 0.40
> MB/s)
> > >
> > 2007-02-27 22:08:31,244 INFO org.apache.hadoop.mapred.TaskTracker:
> > task_0001_r_000224_0 0.33083335% reduce > copy (3176 of 3200 at 1.94
> MB/s)
> > >
> > 2007-02-27 22:08:31,244 INFO org.apache.hadoop.mapred.TaskTracker:
> > task_0001_r_000111_0 0.3321875% reduce > copy (3189 of 3200 at 0.40
> MB/s)
> > >
> > 2007-02-27 22:08:31,394 INFO org.apache.hadoop.mapred.TaskRunner:
> > task_0001_r_000224_0 Need 24 map output(s)
> > 2007-02-27 22:08:31,394 INFO org.apache.hadoop.mapred.TaskRunner:
> > task_0001_r_000224_0 Need 133 map output location(s)
> > 2007-02-27 22:08:31,395 INFO org.apache.hadoop.mapred.TaskRunner:
> > task_0001_r_000224_0 Got 0 new map outputs from jobtracker and 0 map
> > outputs
> > from previous failures
> > 2007-02-27 22:08:31,395 INFO org.apache.hadoop.mapred.TaskRunner:
> > task_0001_r_000224_0 Got 24 known map output location(s); scheduling...
> > 2007-02-27 22:08:31,395 INFO org.apache.hadoop.mapred.TaskRunner:
> > task_0001_r_000224_0 Scheduled 0 of 24 known outputs (24 slow hosts and
> 0
> > dup hosts)
> > 2007-02-27 22:08:32,254 INFO org.apache.hadoop.mapred.TaskTracker:
> > task_0001_r_000224_0 0.33083335% reduce > copy (3176 of 3200 at 1.94
> MB/s)
> > >
> > 2007-02-27 22:08:32,254 INFO org.apache.hadoop.mapred.TaskTracker:
> > task_0001_r_000111_0 0.3321875% reduce > copy (3189 of 3200 at 0.40
> MB/s)
> > >
> > 2007-02-27 22:08:33,264 INFO org.apache.hadoop.mapred.TaskTracker:
> > task_0001_r_000224_0 0.33083335% reduce > copy (3176 of 3200 at 1.94
> MB/s)
> > >
> > 2007-02-27 22:08:33,264 INFO org.apache.hadoop.mapred.TaskTracker:
> > task_0001_r_000111_0 0.3321875% reduce > copy (3189 of 3200 at 0.40
> MB/s)
> > >
> > 2007-02-27 22:08:34,274 INFO org.apache.hadoop.mapred.TaskTracker:
> > task_0001_r_000224_0 0.33083335% reduce > copy (3176 of 3200 at 1.94
> MB/s)
> > >
> > 2007-02-27 22:08:34,274 INFO org.apache.hadoop.mapred.TaskTracker:
> > task_0001_r_000111_0 0.3321875% reduce > copy (3189 of 3200 at 0.40
> MB/s)
> > >
> > 2007-02-27 22:08:35,124 INFO org.apache.hadoop.mapred.TaskRunner:
> > task_0001_r_000111_0 Need 11 map output(s)
> > 2007-02-27 22:08:35,124 INFO org.apache.hadoop.mapred.TaskRunner:
> > task_0001_r_000111_0 Need 234 map output location(s)
> > 2007-02-27 22:08:35,219 INFO org.apache.hadoop.mapred.TaskRunner:
> > task_0001_r_000111_0 Got 0 new map outputs from jobtracker and 0 map
> > outputs
> > from previous failures
> > 2007-02-27 22:08:35,219 INFO org.apache.hadoop.mapred.TaskRunner:
> > task_0001_r_000111_0 Got 11 known map output location(s); scheduling...
> > 2007-02-27 22:08:35,219 INFO org.apache.hadoop.mapred.TaskRunner:
> > task_0001_r_000111_0 Scheduled 0 of 11 known outputs (11 slow hosts and
> 0
> > dup hosts)
> > 2007-02-27 22:08:35,284 INFO org.apache.hadoop.mapred.TaskTracker:
> > task_0001_r_000224_0 0.33083335% reduce > copy (3176 of 3200 at 1.94
> MB/s)
> > >
> > 2007-02-27 22:08:35,284 INFO org.apache.hadoop.mapred.TaskTracker:
> > task_0001_r_000111_0 0.3321875% reduce > copy (3189 of 3200 at 0.40
> MB/s)
> > >
> > 2007-02-27 22:08:36,294 INFO org.apache.hadoop.mapred.TaskTracker:
> > task_0001_r_000224_0 0.33083335% reduce > copy (3176 of 3200 at 1.94
> MB/s)
> > >
> > 2007-02-27 22:08:36,294 INFO org.apache.hadoop.mapred.TaskTracker:
> > task_0001_r_000111_0 0.3321875% reduce > copy (3189 of 3200 at 0.40
> MB/s)
> > >
> > 2007-02-27 22:08:36,404 INFO org.apache.hadoop.mapred.TaskRunner:
> > task_0001_r_000224_0 Need 24 map output(s)
> > 2007-02-27 22:08:36,404 INFO org.apache.hadoop.mapred.TaskRunner:
> > task_0001_r_000224_0 Need 133 map output location(s)
> > 2007-02-27 22:08:36,422 INFO org.apache.hadoop.mapred.TaskRunner:
> > task_0001_r_000224_0 Got 0 new map outputs from jobtracker and 0 map
> > outputs
> > from previous


Mime
View raw message