hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5015) Support sliding window retry capability for container restart
Date Wed, 07 Mar 2018 23:24:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16390428#comment-16390428
] 

Wangda Tan commented on YARN-5015:
----------------------------------

Thanks [~csingh], see my comments below:

1) Instead of adding getRestartTimes/getRemainingRetries to {{ContainerRetryContext}}, I suggest
to have a separate class like NMContainerRetryContext which includes:
- ContainerRetryContext
- getRestartTimes/getRemainingRetries

Since we should not add runtime information to protocol/api classes.

2) mv org.apache.hadoop.yarn.server.retry.SlidingWindowRetryPolicy to org.apache.hadoop.yarn.server.nodemanager.containermanager.container:
Why it is in server-common? 

3) {{shouldRetry}}:
- It's better to return true at the begining of the method when {{getMaxRetries() == ContainerRetryContext.RETRY_FOREVER}},
which can avoid lots of checks in the following functions like calculatePendingRetries.

4) {{calculatePendingRetries}}
{code}
      return retryContext.getRemainingRetries() == -1 ?
          retryContext.getMaxRetries() :
          retryContext.getRemainingRetries();
{code} 
Why check {{retryContext.getRemainingRetries() == -1}}? Should this be getMaxRetries() ==
-1? 

5) {{updateRetryContext}}:
{code}
    retryContext.setRemainingRetries(pendingRetries -1);
{code} 

6) In ContainerImpl: 
{code}
          int n = container.containerRetryContext.getMaxRetries()
              - container.containerRetryContext.getRemainingRetries();
          container.addDiagnostics("Diagnostic message from attempt "
              + n + " : ", "\n");
{code} 
Under the context of SlidingWindowRetry, this n may keep changing. To avoid introducing more
logics, I suggest to remove {{n}} from the diagnostics. 


> Support sliding window retry capability for container restart 
> --------------------------------------------------------------
>
>                 Key: YARN-5015
>                 URL: https://issues.apache.org/jira/browse/YARN-5015
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Varun Vasudev
>            Assignee: Chandni Singh
>            Priority: Major
>              Labels: oct16-medium
>         Attachments: YARN-5015.01.patch, YARN-5015.02.patch, YARN-5015.03.patch
>
>
> We support sliding window retry policy for AM restarts (Introduced in YARN-611). Similar
sliding window retry policy is needed for container restarts.
> With this change, we can introduce a common class for SlidingWindowRetryPolicy ( suggested
by [~vvasudev] in the comments) and integrate it to container restart. 
> In a subsequent jira, we can modify the AM code to use SlidingWindowRetryPolicy which
will unify the AM and container restart code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message