hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3104) RM generates new AMRM tokens every heartbeat between rolling and activation
Date Thu, 12 Feb 2015 15:59:13 GMT

    [ https://issues.apache.org/jira/browse/YARN-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14318432#comment-14318432
] 

Jason Lowe commented on YARN-3104:
----------------------------------

bq. The only concern is if we don't do anything, it is possible for AMs to get authentication
failures

If we don't do anything then AMs who fail to update their token will continue to work as long
as they don't have their RM connection dropped.  Force-closing the connection or reauthenticating
over the same connection just makes the auth failure happen sooner.  Apps will work better
without changing anything, since an app with a recently expired token will likely be able
to talk with the RM for hours or days and usually avoid the auth failures we're worried about.

I agree we should find a way to "fail fast" for this scenario, but also agree it's probably
non-trivial to do so if we can't force-close the connection via the existing RPC API.  If
it's not going to be fixed for 2.7, I'd rather put in a fix so we don't have the RM regenerating
the same token every heartbeat and the corresponding logs.

> RM generates new AMRM tokens every heartbeat between rolling and activation
> ---------------------------------------------------------------------------
>
>                 Key: YARN-3104
>                 URL: https://issues.apache.org/jira/browse/YARN-3104
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.6.0
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>         Attachments: YARN-3104.001.patch, YARN-3104.002.patch, YARN-3104.003.patch
>
>
> When the RM rolls a new AMRM secret, it conveys this to the AMs when it notices they
are still connected with the old key.  However neither the RM nor the AM explicitly close
the connection or otherwise try to reconnect with the new secret.  Therefore the RM keeps
thinking the AM doesn't have the new token on every heartbeat and keeps sending new tokens
for the period between the key roll and the key activation.  Once activated the RM no longer
squawks in its logs about needing to generate a new token every heartbeat (i.e.: second) for
every app, but the apps can still be using the old token.  The token is only checked upon
connection to the RM.  The apps don't reconnect when sent a new token, and the RM doesn't
force them to reconnect by closing the connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message