hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Junping Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5485) Allow repeating job commit by extending OutputCommitter API
Date Mon, 09 Nov 2015 17:40:11 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14996979#comment-14996979

Junping Du commented on MAPREDUCE-5485:

bq. doing ++retries here can remove code duplication for the < check in the while?
Sorry. I miss this comment in my patch just uploaded. Will update in next patch.

bq. Even for a non-repeatable committer, if there is a classpath issue (which can get fixed
by retrying the AM) then the AM should retry, right?
I agree this could be a potentially separated topic. However, it could take more time and
effort to make sure the retry on non-repeatable committer won't bring risk to cause a successl
commit which is not right for result and should get failed earlier. For repeatable committer,
it seems no such risk but it could paid price of unnecessary retry in some cases but earn
more chance for succeed in commit stage in other cases, especially you cannot differentiate
the case belongs to former or later. Just like the exception of deleting temp directory failed,
it could due to AM connection with HDFS (we should retry) or HDFS down permanently (we shouldn't
retry). I would prefer the current trade-off: simple and best effort for commit success in
repeatable case.

> Allow repeating job commit by extending OutputCommitter API
> -----------------------------------------------------------
>                 Key: MAPREDUCE-5485
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 2.1.0-beta
>            Reporter: Nemon Lou
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: MAPREDUCE-5485-demo-2.patch, MAPREDUCE-5485-demo.patch, MAPREDUCE-5485-v1.patch,
> There are chances MRAppMaster crush during job committing,or NodeManager restart cause
the committing AM exit due to container expire.In these cases ,the job will fail.
> However,some jobs can redo commit so failing the job becomes unnecessary.
> Let clients tell AM to allow redo commit or not is a better choice.
> This idea comes from Jason Lowe's comments in MAPREDUCE-4819 

This message was sent by Atlassian JIRA

View raw message