hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-1354) Recover applications upon nodemanager restart
Date Sat, 02 Aug 2014 23:40:12 GMT

     [ https://issues.apache.org/jira/browse/YARN-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Jason Lowe updated YARN-1354:

    Attachment: YARN-1354-v6.patch

Thanks for the comments, Junping!

bq. One way is to wrapper it as PB object (keep writable fields as bytes)

The patch is already wrapping the credentials in a protobuf, specifically ContainerManagerApplicationProto.
 So if/when credentials are stored differently after YARN-668 then we can obsolete this field
and move to other new fields.

The core problem is that at this level Credentials and Tokens are effectively opaque -- we
can't piece them together ourselves and must delegate to them for load/store.  Unfortunately
they are Writables which are notoriously problematic when it comes to updating and dealing
with different versions.  It's possible if they explicitly handle it themselves (e.g.: write
out schema versions and switch on it during load), however they don't typically do that. 
I agree that YARN-668 is the proper place to discuss how best to migrate Credentials and Tokens
to a more manageable infrastructure for supporting upgrades.

bq. Shall we change the name of finishApplication() to storeFinishedApplication() which sounds
more precisely to actual work in store layer?

Sounds good.  I updated the method name to storeFinishedApplication.

I also noticed that the change in YARN-1885 was only wired into the new node registration
path.  When an NM restarts it is going to go through the reconnected node path which was _not_
expecting applications to be out of sync with the RM.  Therefore I updated the reconnected
node path to forward the applications running on the node and have the RM inform the NM to
finish the application if it is no longer active.

> Recover applications upon nodemanager restart
> ---------------------------------------------
>                 Key: YARN-1354
>                 URL: https://issues.apache.org/jira/browse/YARN-1354
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.3.0
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>         Attachments: YARN-1354-v1.patch, YARN-1354-v2-and-YARN-1987-and-YARN-1362.patch,
YARN-1354-v3.patch, YARN-1354-v4.patch, YARN-1354-v5.patch, YARN-1354-v6.patch
> The set of active applications in the nodemanager context need to be recovered for work-preserving
nodemanager restart

This message was sent by Atlassian JIRA

View raw message