hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-299) maps from second jobs will not run until the first job finishes completely
Date Wed, 14 Jun 2006 14:42:30 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-299?page=comments#action_12416204 ] 

Owen O'Malley commented on HADOOP-299:

*smile* I don't have that much free time! I certainly agree with you that the right long term
solution would involve:
  1. Using task killing to make slots available if they were needed.
  2. Not starting reduces until at least one of the maps feeding it has finished and generated
  3. For bonus points you could prefer to run a reduce that has local input. (In general that
won't help, but there is a sub-class of problem where most of the input for each reduce is
coming from a small number of maps.)

> maps from second jobs will not run until the first job finishes completely
> --------------------------------------------------------------------------
>          Key: HADOOP-299
>          URL: http://issues.apache.org/jira/browse/HADOOP-299
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>     Versions: 0.3.2
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>      Fix For: 0.4.0
>  Attachments: map-schedule.patch
> Because of the logic in the JobTracker's pollForNewTask, second jobs will rarely start
running maps until the first job finishes completely. The JobTracker leaves room to re-run
failed maps from the first job and it reserves the total number of maps for the first job.
Thus, if you have more maps in the first job than your cluster capacity, none of the second
job maps will ever run.
> I propose setting the reserve to 1% of the first job's maps.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message