hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Botong Huang (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-6093) Invalid AMRM token exception when RM renew AMRMtoken and FederationRMFailoverProxyProvider failover
Date Fri, 13 Jan 2017 17:52:26 GMT
Botong Huang created YARN-6093:

             Summary: Invalid AMRM token exception when RM renew AMRMtoken and FederationRMFailoverProxyProvider
                 Key: YARN-6093
                 URL: https://issues.apache.org/jira/browse/YARN-6093
             Project: Hadoop YARN
          Issue Type: Bug
          Components: federation
            Reporter: Botong Huang
            Assignee: Botong Huang
            Priority: Minor
             Fix For: YARN-2915

AMRMProxy uses expired AMRMToken to talk to RM, leading to the "Invalid AMRMToken" exception.
The bug is triggered when both conditions are met: 
1. RM rolls master key and renews AMRMToken for a running AM.
2. Existing RPC connection between AMRMProxy and RM drops and attempt to reconnect via failover
in FederationRMFailoverProxyProvider. 

Here's what happened: 

In DefaultRequestInterceptor.init(), we create a proxy ugi, load it with the initial AMRMToken
issued by RM, and used it for initiating rmClient. 

Then we arrive at FederationRMFailoverProxyProvider.init(), a full copy of ugi tokens are
saved locally, create an actual RM proxy and setup the RPC connection. 

Later when RM rolls master key and issues a new AMRMToken, DefaultRequestInterceptor.updateAMRMToken()
updates it into the proxy ugi. 

However the new token is never used until the existing RPC connection between AMRMProxy and
RM drops for other reasons (say master RM crashes). 

At this point, since the service name of the new AMRMToken is not yet set correctly in DefaultRequestInterceptor.updateAMRMToken(),
RPC found no valid AMRMToken when trying to setup a new connection. 

We first hit a "Client cannot authenticate via:[TOKEN]" exception. This is expected. 

Next, FederationRMFailoverProxyProvider fails over, we reset the service token via ClientRMProxy.getRMAddress()
and reconnect. Supposedly this would have worked. 

However since DefaultRequestInterceptor does not use the proxy user for later calls to rmClient,
when performing failover in FederationRMFailoverProxyProvider, we are not in the proxy user.

Currently the code solve the problem by reloading the current ugi with all tokens saved locally
in originalTokens in method addOriginalTokens(). 

The problem is that the original AMRMToken loaded is no longer accepted by RM, and thus we
keep hitting the "Invalid AMRMToken" exception until AM fails. 

The correct way is that rather than saving the original tokens in the proxy ugi, we save the
original ugi itself. 

Every time we perform failover and create the new RM proxy, we use the original ugi, which
is always loaded with the up-to-date AMRMToken. 

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org

View raw message