hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4819) AM can rerun job after reporting final job status to the client
Date Thu, 29 Nov 2012 17:42:58 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13506616#comment-13506616
] 

Jason Lowe commented on MAPREDUCE-4819:
---------------------------------------

bq. As far as YARN-244 is concerned the comments around the code seem to suggest that it was
an explicit decision to cleanup before unregistering. 

Yes, it was explicitly done as a workaround to ensure the staging directory was cleaned up
before the RM shot the AM.  Previously there was a race where the AM was trying to delete
the staging directory while the RM was shooting the AM after it unregistered and often the
AM lost and the staging directory was left around.  Since then we've added a FINISHING state
to allow the AM to cleanup before the RM tries to shoot it.  Given that, we should move the
staging directory cleanup back to after we unregister (but not in this JIRA).

bq. I am not quite clear why the commit would be repeated if the job does not execute any
task at all?

The job will only avoid committing if it sees the job completion event written to the history
file, but that occurs *after* committing.  Therefore if we commit then crash before we sync
that completion event to disk, the second attempt will try to commit again.  And we're seeing
a number of cases where the AM crashed after committing but before completing job history.
 It should be relatively rare, but it can happen and is happening.

bq. the commit code seems to be user pluggable code. In that case, how can we ensure that
every commit implementation can be made into a singleton operation? Can it be as simple as
a committer refusing to commit if the output file already exists? Are committers allowed to
delete an output file if it exists? In that case how does it differentiate between a checkpointed
commit from a previous crashed run vs an old commit from a successful job?

The committer is user-pluggable code and therefore can do *arbitrary* things.  It doesn't
have to be files.  It can be a database commit, an web service transaction, a custom job-end
notification mechanism, or whatever.  Therefore we cannot assume the commit is recoverable
-- there are reasons why the job fails when the committer says it failed, because we can't
retry it.  In the future maybe we can extend the committer API to allow the committer to say
it can attempt to recover from a job commit failure, but for now we can't tell.  That's why
re-running a commit is Not Good.

bq. On a side note, we should be encouraging projects that depend on output markers for job
completion polling, to stop doing that and start using API's. Perhaps in the next version
change. Continuing to support these kind of use cases could make solutions more complex and
fragile than they need to be.

The file marker thing is just one committer's way of handling things.  Committers can do *arbitrary*
things.  The job doesn't even have to produce output as files, for example.  It's pluggable
for a reason, and we can't know or assume what it's doing.  We can only give it interfaces
and restrictions (hopefully as few as possible) to govern how it interoperates with the rest
of the job framework.
                
> AM can rerun job after reporting final job status to the client
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4819
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4819
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: 0.23.3, 2.0.1-alpha
>            Reporter: Jason Lowe
>            Assignee: Bikas Saha
>            Priority: Critical
>         Attachments: MAPREDUCE-4819.1.patch, MAPREDUCE-4819.2.patch
>
>
> If the AM reports final job status to the client but then crashes before unregistering
with the RM then the RM can run another AM attempt.  Currently AM re-attempts assume that
the previous attempts did not reach a final job state, and that causes the job to rerun (from
scratch, if the output format doesn't support recovery).
> Re-running the job when we've already told the client the final status of the job is
bad for a number of reasons.  If the job failed, it's confusing at best since the client was
already told the job failed but the subsequent attempt could succeed.  If the job succeeded
there could be data loss, as a subsequent job launched by the client tries to consume the
job's output as input just as the re-attempt starts removing output files in preparation for
the output commit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message