hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ravi Prakash (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-7450) ATS Client should retry on intermittent Kerberos issues.
Date Mon, 13 Nov 2017 16:55:00 GMT

     [ https://issues.apache.org/jira/browse/YARN-7450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ravi Prakash updated YARN-7450:
-------------------------------
    Description: 
We saw a stack trace (posted in the first comment) in the ResourceManager logs for the TimelineClientImpl
not being able to relogin from keytab.

I'm guessing there was an intermittent issue that failed the kerberos relogin from keytab.
However, I'm assuming this was *not* retried because I only saw one instance of this stack
trace.  I propose that this operation should have been retried.

It seems, this caused events at the ResourceManager to queue up and eventually stop responding
to even basic {{yarn application -list}} commands.

  was:
We saw a stack trace (posted in the first comment) in the ResourceManager logs for the TimelineClientImpl
not being able to relogin from keytab.

I'm guessing there was an intermittent network issue that failed the kerberos relogin from
keytab. However, I'm assuming this was *not* retried because I only saw one instance of this
stack trace.  I propose that this operation should have been retried.

It seems, this caused events at the ResourceManager to queue up and eventually stop responding
to even basic {{yarn application -list}} commands.


> ATS Client should retry on intermittent Kerberos issues.
> --------------------------------------------------------
>
>                 Key: YARN-7450
>                 URL: https://issues.apache.org/jira/browse/YARN-7450
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: ATSv2
>    Affects Versions: 2.7.3
>         Environment: Hadoop-2.7.3
>            Reporter: Ravi Prakash
>
> We saw a stack trace (posted in the first comment) in the ResourceManager logs for the
TimelineClientImpl not being able to relogin from keytab.
> I'm guessing there was an intermittent issue that failed the kerberos relogin from keytab.
However, I'm assuming this was *not* retried because I only saw one instance of this stack
trace.  I propose that this operation should have been retried.
> It seems, this caused events at the ResourceManager to queue up and eventually stop responding
to even basic {{yarn application -list}} commands.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message