hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiao Chen (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-13590) Retry until TGT expires even if the UGI renewal thread encountered exception
Date Tue, 13 Sep 2016 08:00:41 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-13590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Xiao Chen updated HADOOP-13590:
-------------------------------
    Attachment: HADOOP-13590.01.patch

Attaching a patch for the idea.

Current behavior is to just print a exception message on the first failure. Feels to me we
should retry here, since the UGI had a successful login. This would help in scenarios where
the renew failure happen to be intermittent.

Patch 1 just to have the retry to be simply every {{kerberosMinSecondsBeforeRelogin}} interval,
until it succeeds or the tgt expires. We could add a more sophisticated retry logic (e.g.
max # of retries with exponential backoff) if people like this direction.

Also added the exception stack trace to the log message, instead of just the exception msg
alone.

> Retry until TGT expires even if the UGI renewal thread encountered exception
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-13590
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13590
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: security
>    Affects Versions: 2.8.0, 2.7.3, 2.6.4
>            Reporter: Xiao Chen
>            Assignee: Xiao Chen
>         Attachments: HADOOP-13590.01.patch
>
>
> The UGI has a background thread to renew the tgt. On exception, it [terminates itself|https://github.com/apache/hadoop/blob/bee9f57f5ca9f037ade932c6fd01b0dad47a1296/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1013-L1014]
> If something temporarily goes wrong that results in an IOE, even if it recovered no renewal
will be done and client will eventually fail to authenticate. We should retry with our best
effort, until tgt expires, in the hope that the error recovers before that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message