hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Botong Huang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-7630) Fix AMRMToken handling in AMRMProxy
Date Fri, 08 Dec 2017 23:13:00 GMT

     [ https://issues.apache.org/jira/browse/YARN-7630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Botong Huang updated YARN-7630:
    Attachment: YARN-7630.v1.patch

> Fix AMRMToken handling in AMRMProxy
> -----------------------------------
>                 Key: YARN-7630
>                 URL: https://issues.apache.org/jira/browse/YARN-7630
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Botong Huang
>            Assignee: Botong Huang
>            Priority: Minor
>         Attachments: YARN-7630.v1.patch
> Symptom: after RM rolls over the master key for AMRMToken, whenever the RPC connection
from FederationInterceptor to RM breaks due to transient network issue and reconnects, heartbeat
to RM starts failing because of the “Invalid AMRMToken” exception. Whenever it hits, it
happens for both home RM and secondary RMs. 
> Related facts: 
> 1. When RM issues a new AMRMToken, it always send with service name field as empty string.
RPC layer in AM side will set it properly before start using it. 
> 2. UGI keeps all tokens using a map from serviceName->Token. Initially AMRMClientUtils.createRMProxy()
is used to load the first token and start the RM connection. 
> 3. When RM renew the token, YarnServerSecurityUtils.updateAMRMToken() is used to load
it into UGI and replace the existing token (with the same serviceName key). 
> Bug: 
> The bug is that 2-AMRMClientUtils.createRMProxy() and 3-YarnServerSecurityUtils.updateAMRMToken()
is not handling the sequence consistently. We always need to load the token (with empty service
name) into UGI first before we set the serviceName, so that the previous AMRMToken will be
overridden. But 2 is doing it reversely. That’s why after RM rolls the amrmToken, the UGI
end up with two tokens. Whenever the RPC connection break and reconnect, the wrong token could
be picked and thus trigger the exception. 
> Fix: 
> Should load the AMRMToken into UGI first and then update the service name field for RPC

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message