hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5485) Allow repeating job commit by extending OutputCommitter API
Date Fri, 06 Nov 2015 12:01:27 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14993559#comment-14993559

Bikas Saha commented on MAPREDUCE-5485:

bq. I don't think so. Can you take a look at it again?
+    while (jobCommitNotFinished && (retries++ < retriesOnFailure)) {
+      try {
+        commitJobInternal(context);
+        jobCommitNotFinished = false;
+      } catch (Exception e) {
+        if (retries >= retriesOnFailure) { <<<<< doing ++retries here can
remove code duplication for the < check in the while?
+          throw e;
+        } else {
+          LOG.warn("Exception get thrown in job commit, retry (" + retries +
+              ") time.", e);
+        }
+      }
+    }{code}

bq. There are still reasons that related to AM specific, i.e. previous AM cannot connect to
FS (FS or other CloudFS), committer mis-behavior because of getting loaded incorrect (due
to classpath or other defect), etc. I think it make sense to do the best effort to retry the
commit failure (like other reason to cause AM failure) given the commit is repeatable and
all tasks are done successfully.
Sure. But then for such cases commitIsRepeatable may not be strictly needed. Even for a non-repeatable
committer, if there is a classpath issue (which can get fixed by retrying the AM) then the
AM should retry, right? The scope of that change seems related to this but is perhaps large
enough to deserve its own jira as a follow up to this one. E.g. if the committer has written
a failed file then commit is failed. Maybe we need an extension or API exception that allows
us to know if the committer error was fatal or non-fatal and write a retry/failed file based
on that?

> Allow repeating job commit by extending OutputCommitter API
> -----------------------------------------------------------
>                 Key: MAPREDUCE-5485
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 2.1.0-beta
>            Reporter: Nemon Lou
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: MAPREDUCE-5485-demo-2.patch, MAPREDUCE-5485-demo.patch, MAPREDUCE-5485-v1.patch
> There are chances MRAppMaster crush during job committing,or NodeManager restart cause
the committing AM exit due to container expire.In these cases ,the job will fail.
> However,some jobs can redo commit so failing the job becomes unnecessary.
> Let clients tell AM to allow redo commit or not is a better choice.
> This idea comes from Jason Lowe's comments in MAPREDUCE-4819 

This message was sent by Atlassian JIRA

View raw message