hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-1341) Recover NMTokens upon nodemanager restart
Date Thu, 19 Jun 2014 01:32:25 GMT

     [ https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Jason Lowe updated YARN-1341:

    Attachment: YARN-1341v6.patch

Thanks for reviewing, Junping!

bq. The change in BaseContainerTokenSecretManager.java is not necessary and I believe that
belongs to YARN-1342.

Good catch, removed.

bq. Can we consolidate the code in a separated method together with NMContainerTokenSecretManager
as we will do similar thing to recover ContainerToken staff which make code have duplicated

I'm not sure I understand what you're requesting.  Recovering the NM tokens is one line of
code (3 if we count the "if canRecover" part), and recovering the container tokens in YARN-1342
will add one more line for that (inside the same "if canRecover" block).  I went ahead and
factored this into a separate method, however I'm not sure it matches what you were expecting
as I don't see where we're saving duplicated code.  If what's in the updated patch isn't what
you expected, please provide some sample pseudo-code to demonstrate how we can avoid duplication
of code.

bq. Does log error here is just enough in case of failure in store? If Master key is updated
but not persistent, then it could cause some inconsistency when recover it. I think we should
throw some exception here if store get failed and rollback the key just set.

The problem with throwing an exception is what to do with the exception -- do we take down
the NM?  That seems like a drastic answer since the NM will likely chug along just fine without
the key stored.  It only becomes a problem when the NM restarts and restores an old key. 
However if we rollback the old key here then we take that only-breaks-if-we-happened-to-restart
case and make it an always-breaks scenario.  Eventually the old key will no longer be valid
to the RM, and none of the AMs will be able to authenticate to the NM.  Therefore I thought
it would be better to log the error, press onward, and hope we don't restart before we store
a valid key again (maybe store error was transient) rather than either take down the NM or
have things start failing even without a restart.

> Recover NMTokens upon nodemanager restart
> -----------------------------------------
>                 Key: YARN-1341
>                 URL: https://issues.apache.org/jira/browse/YARN-1341
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.3.0
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>         Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, YARN-1341v4-and-YARN-1987.patch,
YARN-1341v5.patch, YARN-1341v6.patch

This message was sent by Atlassian JIRA

View raw message