hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhijie Shen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5956) MapReduce AM should not use maxAttempts to determine if this is the last retry
Date Fri, 22 Aug 2014 17:34:13 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107144#comment-14107144
] 

Zhijie Shen commented on MAPREDUCE-5956:
----------------------------------------

bq. When it comes to leaking staging directories, there are far more common cases where that
occurs than this scenario. e.g.: application killed before AM starts or in-between AM retries,
AM is misconfigured and fails every time, etc. It seems like the scenario we're worried about
is highly unlikely, so I don't think it'd be a big deal to put into 2.5.1 from that standpoint.

Hm... It makes sense to me. [~vinodkv], what do you think?


> MapReduce AM should not use maxAttempts to determine if this is the last retry
> ------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5956
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5956
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2
>    Affects Versions: 2.4.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Wangda Tan
>            Priority: Blocker
>             Fix For: 2.6.0
>
>         Attachments: MR-5956.patch, MR-5956.patch
>
>
> Found this while reviewing YARN-2074. The problem is that after YARN-2074, we don't count
AM preemption towards AM failures on RM side, but MapReduce AM itself checks the attempt id
against the max-attempt count to determine if this is the last attempt.
> {code}
>     public void computeIsLastAMRetry() {
>       isLastAMRetry = appAttemptID.getAttemptId() >= maxAppAttempts;
>     }
> {code}
> This causes issues w.r.t deletion of staging directory etc..



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message