hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAPREDUCE-4819) AM can rerun job after reporting final job status to the client
Date Mon, 26 Nov 2012 19:04:58 GMT
Jason Lowe created MAPREDUCE-4819:

             Summary: AM can rerun job after reporting final job status to the client
                 Key: MAPREDUCE-4819
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4819
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: mr-am
    Affects Versions: 2.0.1-alpha, 0.23.3
            Reporter: Jason Lowe
            Priority: Critical

If the AM reports final job status to the client but then crashes before unregistering with
the RM then the RM can run another AM attempt.  Currently AM re-attempts assume that the previous
attempts did not reach a final job state, and that causes the job to rerun (from scratch,
if the output format doesn't support recovery).

Re-running the job when we've already told the client the final status of the job is bad for
a number of reasons.  If the job failed, it's confusing at best since the client was already
told the job failed but the subsequent attempt could succeed.  If the job succeeded there
could be data loss, as a subsequent job launched by the client tries to consume the job's
output as input just as the re-attempt starts removing output files in preparation for the
output commit.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message