Return-Path: X-Original-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4324217385 for ; Wed, 18 Mar 2015 02:39:39 +0000 (UTC) Received: (qmail 7837 invoked by uid 500); 18 Mar 2015 02:39:38 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 7783 invoked by uid 500); 18 Mar 2015 02:39:38 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 7766 invoked by uid 99); 18 Mar 2015 02:39:38 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Mar 2015 02:39:38 +0000 Date: Wed, 18 Mar 2015 02:39:38 +0000 (UTC) From: "Aaron T. Myers (JIRA)" To: common-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HADOOP-11722) Some Instances of Services using ZKDelegationTokenSecretManager go down when old token cannot be deleted MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HADOOP-11722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HADOOP-11722: ------------------------------------ Target Version/s: 2.7.0 Affects Version/s: 2.6.0 +1, the latest patch looks good to me. I'm confident the test failure is unrelated, and as mentioned before I don't think this change warrants a new test. I'm going to commit this momentarily. > Some Instances of Services using ZKDelegationTokenSecretManager go down when old token cannot be deleted > -------------------------------------------------------------------------------------------------------- > > Key: HADOOP-11722 > URL: https://issues.apache.org/jira/browse/HADOOP-11722 > Project: Hadoop Common > Issue Type: Bug > Affects Versions: 2.6.0 > Reporter: Arun Suresh > Assignee: Arun Suresh > Attachments: HADOOP-11722.1.patch, HADOOP-11722.2.patch > > > The delete node code in {{ZKDelegationTokenSecretManager}} is as follows : > {noformat} > while(zkClient.checkExists().forPath(nodeRemovePath) != null){ > zkClient.delete().guaranteed().forPath(nodeRemovePath); > } > {noformat} > When instances of a Service using {{ZKDelegationTokenSecretManager}} try deleting a node simutaneously, It is possible that all of them enter into the while loop in which case, all peers will try to delete the node.. Only 1 will succeed and the rest will throw an exception.. which will bring down the node. > The Exception is as follows : > {noformat} > 2015-03-15 10:24:54,000 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover thread received unexpected exception > java.lang.RuntimeException: Could not remove Stored Token ZKDTSMDelegationToken_28 > at org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.removeStoredToken(ZKDelegationTokenSecretManager.java:770) > at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.removeExpiredToken(AbstractDelegationTokenSecretManager.java:605) > at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.access$400(AbstractDelegationTokenSecretManager.java:54) > at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager$ExpiredTokenRemover.run(AbstractDelegationTokenSecretManager.java:656) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /zkdtsm/ZKDTSMRoot/ZKDTSMTokensRoot/DT_28 > at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873) > at org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:238) > at org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:233) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) > at org.apache.curator.framework.imps.DeleteBuilderImpl.pathInForeground(DeleteBuilderImpl.java:230) > at org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:214) > at org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:41) > at org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.removeStoredToken(ZKDelegationTokenSecretManager.java:764) > ... 4 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)