hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Saxena (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4321) Incessant retries if NoAuthException is thrown by Zookeeper in non HA mode
Date Tue, 03 Nov 2015 07:57:27 GMT

    [ https://issues.apache.org/jira/browse/YARN-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14986867#comment-14986867
] 

Varun Saxena commented on YARN-4321:
------------------------------------

Thanks [~jianhe] for the review and commit

> Incessant retries if NoAuthException is thrown by Zookeeper in non HA mode
> --------------------------------------------------------------------------
>
>                 Key: YARN-4321
>                 URL: https://issues.apache.org/jira/browse/YARN-4321
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.7.1
>            Reporter: Varun Saxena
>            Assignee: Varun Saxena
>             Fix For: 2.7.2
>
>         Attachments: YARN-4321-branch-2.7.01.patch
>
>
> This applies to only branch-2.7 or earlier code.
> When a {{NoAuthException}} is thrown in non HA mode(like in the scenario of YARN-4127),
RM incessantly keeps on retrying the ZK operation.
> {noformat}
> 2015-10-23 09:22:10,209 DEBUG [SyncThread:0] server.DataTree (DataTree.java:processTxn(949))
- Ignoring processTxn failure hdr: -1 : error: -102
> 2015-10-23 09:22:10,210 DEBUG [main-SendThread(127.0.0.1:11221)] zookeeper.ClientCnxn
(ClientCnxn.java:readResponse(818)) - Reading reply sessionid:0x15092d1ebe10001, packet::
clientPath:null serverPath:null finished:false header:: 7591,1  replyHeader:: 7591,7610,-102
 request:: '/rmstore/ZKRMStateRoot/RMAppRoot,,v{s{31,s{'world,'anyone}}},0  response::
> 2015-10-23 09:22:10,210 INFO  [ProcessThread(sid:0 cport:-1):] server.PrepRequestProcessor
(PrepRequestProcessor.java:pRequest(645)) - Got user-level KeeperException when processing
sessionid:0x15092d1ebe10001 type:create cxid:0x1da8 zxid:0x1dbb txntype:-1 reqpath:n/a Error
Path:null Error:KeeperErrorCode = NoAuth
> {noformat}
> This is because we do not handle NoAuthException properly in branch-2.7 code when HA
is not enabled.
> In {{ZKRMStateStore#runWithRetries}}, we have code as under. As can be seen if HA is
not enabled, we neither rethrow NoAuthException nor do we have any logic to increment retries
and back out if retries are maxed out.
> {code}
>  T runWithRetries() throws Exception {
>       int retry = 0;
>       while (true) {
>         try {
>           return runWithCheck();
>         } catch (KeeperException.NoAuthException nae) {
>           if (HAUtil.isHAEnabled(getConfig())) {
>             // NoAuthException possibly means that this store is fenced due to
>             // another RM becoming active. Even if not,
>             // it is safer to assume we have been fenced
>             throw new StoreFencedException();
>           }
>         } catch (KeeperException ke) {
>           .............
>        }
>      }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message