hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chandni Singh (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (YARN-5015) Support sliding window retry capability for container restart
Date Thu, 08 Mar 2018 00:18:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16390490#comment-16390490
] 

Chandni Singh edited comment on YARN-5015 at 3/8/18 12:17 AM:
--------------------------------------------------------------

 [~leftnoteasy] Please find my answers below to some of the questions:
{quote}2) mv org.apache.hadoop.yarn.server.retry.SlidingWindowRetryPolicy to org.apache.hadoop.yarn.server.nodemanager.containermanager.container:
Why it is in server-common?
{quote}
It is in server common so that later we can use it for AM restart. Eventually we have to unify
the code for AM and container restart, so this class needs to be accessible to RM as well.
{quote}4) calculatePendingRetries

return retryContext.getRemainingRetries() == -1 ? retryContext.getMaxRetries() : retryContext.getRemainingRetries();

 Why check {{retryContext.getRemainingRetries() == -1}}? Should this be getMaxRetries()
== -1?
{quote}
The default value of {{remainingRetries}} is -1, that is, when it is not set, it is -1.

If remainingRetries is not set then pending retries = {{maxRetries}}. Otherwise, pendingRetries
= {{remainingRetries}}.
 Just after this we update the {{remainingRetries}} = {{pendingRetries}} - 1.
{quote}1) Instead of adding getRestartTimes/getRemainingRetries to {{ContainerRetryContext}},
I suggest to have a separate class like NMContainerRetryContext which includes:
{quote}
Similar to 2, should I create a {{SlidingContainerRetryContext}} in the server-common? Even
this needs to be accessible to RM later when we change AM retry code to use this common class?

 

 


was (Author: csingh):
 [~leftnoteasy] Please find my answers below to some of the questions:
{quote}2) mv org.apache.hadoop.yarn.server.retry.SlidingWindowRetryPolicy to org.apache.hadoop.yarn.server.nodemanager.containermanager.container:
Why it is in server-common?
{quote}
It is in server common so that later we can use it for AM restart. Eventually we have to unify
the code for AM and container restart, so this class needs to be accessible to RM as well.
{quote}4) calculatePendingRetries

return retryContext.getRemainingRetries() == -1 ? retryContext.getMaxRetries() : retryContext.getRemainingRetries();

 Why check {{retryContext.getRemainingRetries() == -1}}? Should this be getMaxRetries()
== -1?
{quote}
The default value of {{remainingRetries}} is -1, that is, when it is not set, it is -1.

If remainingRetries is not set then pending retries = {{maxRetries}}. Otherwise, pendingRetries
= {{remainingRetries}}.
 Just after this we update the {{remainingRetries}} = {{pendingRetries}} - 1.

> Support sliding window retry capability for container restart 
> --------------------------------------------------------------
>
>                 Key: YARN-5015
>                 URL: https://issues.apache.org/jira/browse/YARN-5015
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Varun Vasudev
>            Assignee: Chandni Singh
>            Priority: Major
>              Labels: oct16-medium
>         Attachments: YARN-5015.01.patch, YARN-5015.02.patch, YARN-5015.03.patch
>
>
> We support sliding window retry policy for AM restarts (Introduced in YARN-611). Similar
sliding window retry policy is needed for container restarts.
> With this change, we can introduce a common class for SlidingWindowRetryPolicy ( suggested
by [~vvasudev] in the comments) and integrate it to container restart. 
> In a subsequent jira, we can modify the AM code to use SlidingWindowRetryPolicy which
will unify the AM and container restart code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message