hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Subru Krishnan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6093) Invalid AMRM token exception when using FederationRMFailoverProxyProvider at AMRMtoken renewal during a RM failover
Date Wed, 18 Jan 2017 01:52:26 GMT

    [ https://issues.apache.org/jira/browse/YARN-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15827238#comment-15827238
] 

Subru Krishnan commented on YARN-6093:
--------------------------------------

Thanks [~botong] for the patch. At high level, it looks good but it'll be great if we can
test it e2e in a cluster as this is a nuanced issue. 

Also the checkstyle issue seems fairly trivial to fix.

> Invalid AMRM token exception when using FederationRMFailoverProxyProvider at AMRMtoken
renewal during a RM failover
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-6093
>                 URL: https://issues.apache.org/jira/browse/YARN-6093
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: amrmproxy, federation
>    Affects Versions: YARN-2915
>            Reporter: Botong Huang
>            Assignee: Botong Huang
>            Priority: Minor
>         Attachments: YARN-6093.v1.patch, YARN-6093-YARN-2915.v1.patch, YARN-6093-YARN-2915.v2.patch
>
>
> AMRMProxy uses expired AMRMToken to talk to RM, leading to the "Invalid AMRMToken" exception.
The bug is triggered when both conditions are met: 
> 1. RM rolls master key and renews AMRMToken for a running AM.
> 2. Existing RPC connection between AMRMProxy and RM drops and attempt to reconnect via
failover in FederationRMFailoverProxyProvider. 
> Here's what happened: 
> In DefaultRequestInterceptor.init(), we create a proxy ugi, load it with the initial
AMRMToken issued by RM, and used it for initiating rmClient. Then we arrive at FederationRMFailoverProxyProvider.init(),
a full copy of ugi tokens are saved locally, create an actual RM proxy and setup the RPC connection.

> Later when RM rolls master key and issues a new AMRMToken, DefaultRequestInterceptor.updateAMRMToken()
updates it into the proxy ugi. 
> However the new token is never used, until the existing RPC connection between AMRMProxy
and RM drops for other reasons (say master RM crashes). 
> When we try to reconnect, since the service name of the new AMRMToken is not yet set
correctly in DefaultRequestInterceptor.updateAMRMToken(), RPC found no valid AMRMToken when
trying to setup a new connection. We first hit a "Client cannot authenticate via:[TOKEN]"
exception. This is expected. 
> Next, FederationRMFailoverProxyProvider fails over, we reset the service token via ClientRMProxy.getRMAddress()
and reconnect. Supposedly this would have worked. 
> However since DefaultRequestInterceptor does not use the proxy user for later calls to
rmClient, when performing failover in FederationRMFailoverProxyProvider, we are not in the
proxy user. Currently the code solve the problem by reloading the current ugi with all tokens
saved locally in originalTokens in method addOriginalTokens(). The problem is that the original
AMRMToken loaded is no longer accepted by RM, and thus we keep hitting the "Invalid AMRMToken"
exception until AM fails. 
> The correct way is that rather than saving the original tokens in the proxy ugi, we save
the original ugi itself. Every time we perform failover and create the new RM proxy, we use
the original ugi, which is always loaded with the up-to-date AMRMToken. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message