hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery
Date Thu, 13 Aug 2015 14:46:46 GMT

    [ https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14695316#comment-14695316
] 

Jason Lowe commented on YARN-4041:
----------------------------------

bq. IIRR, synchronous recovery was to fail-fast if recovery doesn't work. With the proposed
change, what happens when the recovery fails?
Arguably the same thing that happens when the RM goes to renew tokens on a live application
and fails without a restart.  IIRC this is not fatal to either the RM nor the application
when this occurs today.  In general I think we should make restarting as orthogonal as possible
to token renewals, and ideally RM restart should not cause an out-of-band token renewal storm.


> Slow delegation token renewal can severely prolong RM recovery
> --------------------------------------------------------------
>
>                 Key: YARN-4041
>                 URL: https://issues.apache.org/jira/browse/YARN-4041
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.6.0
>            Reporter: Jason Lowe
>            Assignee: Sunil G
>
> When the RM does a work-preserving restart it synchronously tries to renew delegation
tokens for every active application.  If a token server happens to be down or is running slow
and a lot of the active apps were using tokens from that server then it can have a huge impact
on the time it takes the RM to process the restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message