hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-60) NMs rejects all container tokens after secret key rolls
Date Wed, 29 Aug 2012 23:51:08 GMT

     [ https://issues.apache.org/jira/browse/YARN-60?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vinod Kumar Vavilapalli updated YARN-60:
----------------------------------------

    Attachment: YARN-60-20120829.txt

The bug is in {{ResourceTrackerService}} which was storing _lastKnownMasterKey_ for a node
in {{RMNode}} incorrectly. After each update, it was memorizing _nextMasterKey_ which will
be reset to null immediately on roll-over. Because of this, NMs get NPEs, don't update their
master-keys and hence reject all the containers.

In the attached patch
 - I removed storing _currentKey_ in RMNode altogether. Now NM always reports its _lastKnownMasterKey_
and RM checks if any updates need to be sent.
 - Added {{TestRMNMSecretKeys}} to make sure that RM sends correct key-updates to NM. This
was the test which caught the issue for me, it will fail without the code changes and passes
after the fix.
 - I also earlier did a bogus fix (without really knowing though :) ) for {{TestContainerManagerSecurity}}
over at YARN-39. Corrected that too.

We also need to modify {{NodeStatusUpdaterImpl}} to react properly on exceptions from RM.
Will do so separately.
                
> NMs rejects all container tokens after secret key rolls
> -------------------------------------------------------
>
>                 Key: YARN-60
>                 URL: https://issues.apache.org/jira/browse/YARN-60
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.2.0-alpha, 0.23.3
>            Reporter: Daryn Sharp
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>         Attachments: YARN-60-20120829.txt
>
>
> The NM's token secret manager will reject all container tokens after the secret key is
activated which means the NM will not launch _any_ containers including AMs.  The whole yarn
cluster becomes inoperable in 1d.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message