hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-60) NMs rejects all container tokens after secret key rolls
Date Wed, 29 Aug 2012 23:51:08 GMT

     [ https://issues.apache.org/jira/browse/YARN-60?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Vinod Kumar Vavilapalli updated YARN-60:

    Attachment: YARN-60-20120829.txt

The bug is in {{ResourceTrackerService}} which was storing _lastKnownMasterKey_ for a node
in {{RMNode}} incorrectly. After each update, it was memorizing _nextMasterKey_ which will
be reset to null immediately on roll-over. Because of this, NMs get NPEs, don't update their
master-keys and hence reject all the containers.

In the attached patch
 - I removed storing _currentKey_ in RMNode altogether. Now NM always reports its _lastKnownMasterKey_
and RM checks if any updates need to be sent.
 - Added {{TestRMNMSecretKeys}} to make sure that RM sends correct key-updates to NM. This
was the test which caught the issue for me, it will fail without the code changes and passes
after the fix.
 - I also earlier did a bogus fix (without really knowing though :) ) for {{TestContainerManagerSecurity}}
over at YARN-39. Corrected that too.

We also need to modify {{NodeStatusUpdaterImpl}} to react properly on exceptions from RM.
Will do so separately.
> NMs rejects all container tokens after secret key rolls
> -------------------------------------------------------
>                 Key: YARN-60
>                 URL: https://issues.apache.org/jira/browse/YARN-60
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.2.0-alpha, 0.23.3
>            Reporter: Daryn Sharp
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>         Attachments: YARN-60-20120829.txt
> The NM's token secret manager will reject all container tokens after the secret key is
activated which means the NM will not launch _any_ containers including AMs.  The whole yarn
cluster becomes inoperable in 1d.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message