hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-8865) RMStateStore contains large number of expired RMDelegationToken
Date Thu, 11 Oct 2018 14:38:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-8865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16646515#comment-16646515

Daryn Sharp commented on YARN-8865:

Good job.  That explains why the secret manager doesn't remove them.  What's interesting
is secret keys are supposed to outlive their tokens.  Were secret keys manually deleted?
 Regardless the secret manager should be able to recover its state.

The patch is a high risky change for a common class.  All secret managers are not be equipped
to handle mutation during loading.  Case in point: The NN generates an edit to remove tokens.
 Edits cannot be generated while replaying edits (restoring state).  Fundamentally a HA
standby cannot modify state.  Similar issues probably exist for other secret managers.

Perhaps the lowest risk change is add tokens with an invalid key anyway.  Set the password
to null.  Authentication will fail, and should allow the expiration thread to correctly remove
the tokens.

Or the lowest risk change is modify the RMDTSM to handle removal while restoring state.

> RMStateStore contains large number of expired RMDelegationToken
> ---------------------------------------------------------------
>                 Key: YARN-8865
>                 URL: https://issues.apache.org/jira/browse/YARN-8865
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 3.1.0
>            Reporter: Wilfred Spiegelenburg
>            Assignee: Wilfred Spiegelenburg
>            Priority: Major
>         Attachments: YARN-8865.001.patch, YARN-8865.002.patch
> When the RM state store is restored expired delegation tokens are restored and added
to the system. These expired tokens do not get cleaned up or removed. The exact reason why
the tokens are still in the store is not clear. We have seen as many as 250,000 tokens in
the store some of which were 2 years old.
> This has two side effects:
> * for the zookeeper store this leads to a jute buffer exhaustion issue and prevents the
RM from becoming active.
> * restore takes longer than needed and heap usage is higher than it should be
> We should not restore already expired tokens since they cannot be renewed or used.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message