hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jian He (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6153) keepContainer does not work when AM retry window is set
Date Fri, 17 Feb 2017 19:29:42 GMT

    [ https://issues.apache.org/jira/browse/YARN-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872368#comment-15872368
] 

Jian He commented on YARN-6153:
-------------------------------

thanks for updating the patch:

- could you add brief comment at the head of testAMRestartNotLostContainerAfterAttemptFailuresValidityInterval
to explain what the test mostly does?
- In RMAppAttemptImpl, why is getStartTime used for checking validityInterval. Also, given
that shouldCountTowardsMaxAttemptRetry internally already contains checking validity interval,
this code is not needed  ? because it's already done in the {{if (!appAttempt.shouldCountTowardsMaxAttemptRetry())
{}} before. 
{code}
            } else {
              // After AM reset window time, it is no longer the last attempt.
              long attemptFailuresValidityInterval = appAttempt.submissionContext.getAttemptFailuresValidityInterval();
              long end = System.currentTimeMillis();
              if (attemptFailuresValidityInterval > 0
                && appAttempt.getStartTime() < (end - attemptFailuresValidityInterval))
{
                keepContainersAcrossAppAttempts = true;
              }
{code}
- A couple of places exceed 80 column limit, pls fix those.


> keepContainer does not work when AM retry window is set
> -------------------------------------------------------
>
>                 Key: YARN-6153
>                 URL: https://issues.apache.org/jira/browse/YARN-6153
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.7.1
>            Reporter: kyungwan nam
>         Attachments: YARN-6153.001.patch, YARN-6153.002.patch
>
>
> yarn.resourcemanager.am.max-attempts has been configured to 2 in my cluster.
> I submitted a YARN application (slider app) that keepContainers=true, attemptFailuresValidityInterval=300000.
> it did work properly when AM was failed firstly.
> all containers launched by previous AM were resynced with new AM (attempt2) without killing
containers.
> after 10 minutes, I thought AM failure count was reset by attemptFailuresValidityInterval
(5 minutes).
> but, all containers were killed when AM was failed secondly. (new AM attempt3 was launched
properly)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message