Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: yarn-issues@hadoop.apache.org
Date: Thu, 12 Feb 2015 15:59:13 +0000 (UTC)
From: "Jason Lowe (JIRA)" <jira@apache.org>
To: yarn-issues@hadoop.apache.org
Message-ID: <JIRA.12770400.1422378085000.28172.1423756753845@Atlassian.JIRA>
In-Reply-To: <JIRA.12770400.1422378085000@Atlassian.JIRA>
References: <JIRA.12770400.1422378085000@Atlassian.JIRA>
 <JIRA.12770400.1422378085139@arcas>
Subject: [jira] [Commented] (YARN-3104) RM generates new AMRM tokens every
 heartbeat between rolling and activation
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/YARN-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14318432#comment-14318432 ] 

Jason Lowe commented on YARN-3104:
----------------------------------

bq. The only concern is if we don't do anything, it is possible for AMs to get authentication failures

If we don't do anything then AMs who fail to update their token will continue to work as long as they don't have their RM connection dropped.  Force-closing the connection or reauthenticating over the same connection just makes the auth failure happen sooner.  Apps will work better without changing anything, since an app with a recently expired token will likely be able to talk with the RM for hours or days and usually avoid the auth failures we're worried about.

I agree we should find a way to "fail fast" for this scenario, but also agree it's probably non-trivial to do so if we can't force-close the connection via the existing RPC API.  If it's not going to be fixed for 2.7, I'd rather put in a fix so we don't have the RM regenerating the same token every heartbeat and the corresponding logs.

> RM generates new AMRM tokens every heartbeat between rolling and activation
> ---------------------------------------------------------------------------
>
>                 Key: YARN-3104
>                 URL: https://issues.apache.org/jira/browse/YARN-3104
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.6.0
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>         Attachments: YARN-3104.001.patch, YARN-3104.002.patch, YARN-3104.003.patch
>
>
> When the RM rolls a new AMRM secret, it conveys this to the AMs when it notices they are still connected with the old key.  However neither the RM nor the AM explicitly close the connection or otherwise try to reconnect with the new secret.  Therefore the RM keeps thinking the AM doesn't have the new token on every heartbeat and keeps sending new tokens for the period between the key roll and the key activation.  Once activated the RM no longer squawks in its logs about needing to generate a new token every heartbeat (i.e.: second) for every app, but the apps can still be using the old token.  The token is only checked upon connection to the RM.  The apps don't reconnect when sent a new token, and the RM doesn't force them to reconnect by closing the connection.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)