hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Henry Robinson (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HADOOP-13604) Abort retry loop when RPC has an unrecoverable error
Date Tue, 13 Sep 2016 17:55:22 GMT
Henry Robinson created HADOOP-13604:

             Summary: Abort retry loop when RPC has an unrecoverable error
                 Key: HADOOP-13604
                 URL: https://issues.apache.org/jira/browse/HADOOP-13604
             Project: Hadoop Common
          Issue Type: Improvement
            Reporter: Henry Robinson

I've seen an issue where, after an RPC client hit an error obtaining a TGT from Kerberos,
the RPC client continues to retry even though there's no chance of success (the no login window
is set to 600s).

In this particular deployment, the client retries 15 times at 15s intervals, leading to a
delay of more than three minutes before the failure is bubbled up to the client when the RPC
ultimately fails.

Unrecoverable errors (like failures to login to Kerberos) should lead to fast aborts of the
retry loop.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org

View raw message