hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Junping Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1337) Recover containers upon nodemanager restart
Date Mon, 11 Aug 2014 16:02:14 GMT

    [ https://issues.apache.org/jira/browse/YARN-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092916#comment-14092916
] 

Junping Du commented on YARN-1337:
----------------------------------

Thanks [~jlowe] for updating the patch. Some trivial issues to fix below, other looks good
to me:

{code}
+  public void addCompletedContainer(ContainerId containerId);
+
{code}
Better to add javadoc for new added (or move from private) public method.

{code}
-  private volatile AtomicBoolean shouldLaunchContainer = new AtomicBoolean(false);
-  private volatile AtomicBoolean completed = new AtomicBoolean(false);
+  protected volatile AtomicBoolean shouldLaunchContainer =
+      new AtomicBoolean(false);
+  protected volatile AtomicBoolean completed = new AtomicBoolean(false);
{code}
volatile is unncessary as it was using AtomicBoolean already.

> Recover containers upon nodemanager restart
> -------------------------------------------
>
>                 Key: YARN-1337
>                 URL: https://issues.apache.org/jira/browse/YARN-1337
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.3.0
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>         Attachments: YARN-1337-v1.patch, YARN-1337-v2.patch
>
>
> To support work-preserving NM restart we need to recover the state of the containers
when the nodemanager went down.  This includes informing the RM of containers that have exited
in the interim and a strategy for dealing with the exit codes from those containers along
with how to reacquire the active containers and determine their exit codes when they terminate.
 The state of finished containers also needs to be recovered.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message