hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-1337) Recover containers upon nodemanager restart
Date Tue, 12 Aug 2014 01:54:12 GMT

     [ https://issues.apache.org/jira/browse/YARN-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jason Lowe updated YARN-1337:
-----------------------------

    Attachment: YARN-1337-v3.patch

Thanks for taking another look, Junping.

.bq Better to add javadoc for new added (or move from private) public method.

I documented all of the NodeStatusUpdater methods and also the NMStateStoreService public
methods that didn't already have javadocs.

.bq volatile is unncessary as it was using AtomicBoolean already.

Fixed.

> Recover containers upon nodemanager restart
> -------------------------------------------
>
>                 Key: YARN-1337
>                 URL: https://issues.apache.org/jira/browse/YARN-1337
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.3.0
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>         Attachments: YARN-1337-v1.patch, YARN-1337-v2.patch, YARN-1337-v3.patch
>
>
> To support work-preserving NM restart we need to recover the state of the containers
when the nodemanager went down.  This includes informing the RM of containers that have exited
in the interim and a strategy for dealing with the exit codes from those containers along
with how to reacquire the active containers and determine their exit codes when they terminate.
 The state of finished containers also needs to be recovered.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message