Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B1E3810A42 for ; Mon, 2 Dec 2013 23:52:36 +0000 (UTC) Received: (qmail 34096 invoked by uid 500); 2 Dec 2013 23:52:36 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 34054 invoked by uid 500); 2 Dec 2013 23:52:36 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 34045 invoked by uid 99); 2 Dec 2013 23:52:36 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Dec 2013 23:52:36 +0000 Date: Mon, 2 Dec 2013 23:52:36 +0000 (UTC) From: "Jimmy Xiang (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-5555) CacheAdmin commands fail when first listed NameNode is in Standby MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-5555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13837092#comment-13837092 ] Jimmy Xiang commented on HDFS-5555: ----------------------------------- DFSClient uses RetryInvocationHandler in this case while the iterator uses ProtobufRpcEngine#Invoker which doesn't handle failover. One fix is to throw StandbyException at the beginning instead of returning an iterator. The other fix is to make sure the iterator supports failover as well. If we throw StandbyException at the beginning, the issue will comes up again when iterating the iterator while NN fails over in the middle. > CacheAdmin commands fail when first listed NameNode is in Standby > ----------------------------------------------------------------- > > Key: HDFS-5555 > URL: https://issues.apache.org/jira/browse/HDFS-5555 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching > Affects Versions: 3.0.0 > Reporter: Stephen Chu > Assignee: Jimmy Xiang > > I am on a HA-enabled cluster. The NameNodes are on host-1 and host-2. > In the configurations, we specify the host-1 NN first and the host-2 NN afterwards in the _dfs.ha.namenodes.ns1_ property (where _ns1_ is the name of the nameservice). > If the host-1 NN is Standby and the host-2 NN is Active, some CacheAdmins will fail complaining about operation not supported in standby state. > e.g. > {code} > bash-4.1$ hdfs cacheadmin -removeDirectives -path /user/hdfs2 > Exception in thread "main" org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby > at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87) > at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1501) > at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1082) > at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.listCacheDirectives(FSNamesystem.java:6892) > at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer$ServerSideCacheEntriesIterator.makeRequest(NameNodeRpcServer.java:1263) > at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer$ServerSideCacheEntriesIterator.makeRequest(NameNodeRpcServer.java:1249) > at org.apache.hadoop.fs.BatchedRemoteIterator.makeRequest(BatchedRemoteIterator.java:77) > at org.apache.hadoop.fs.BatchedRemoteIterator.makeRequestIfNeeded(BatchedRemoteIterator.java:85) > at org.apache.hadoop.fs.BatchedRemoteIterator.hasNext(BatchedRemoteIterator.java:99) > at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.listCacheDirectives(ClientNamenodeProtocolServerSideTranslatorPB.java:1087) > at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1499) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043) > at org.apache.hadoop.ipc.Client.call(Client.java:1348) > at org.apache.hadoop.ipc.Client.call(Client.java:1301) > at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.listCacheDirectives(Unknown Source) > at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB$CacheEntriesIterator.makeRequest(ClientNamenodeProtocolTranslatorPB.java:1079) > at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB$CacheEntriesIterator.makeRequest(ClientNamenodeProtocolTranslatorPB.java:1064) > at org.apache.hadoop.fs.BatchedRemoteIterator.makeRequest(BatchedRemoteIterator.java:77) > at org.apache.hadoop.fs.BatchedRemoteIterator.makeRequestIfNeeded(BatchedRemoteIterator.java:85) > at org.apache.hadoop.fs.BatchedRemoteIterator.hasNext(BatchedRemoteIterator.java:99) > at org.apache.hadoop.hdfs.DistributedFileSystem$32.hasNext(DistributedFileSystem.java:1704) > at org.apache.hadoop.hdfs.tools.CacheAdmin$RemoveCacheDirectiveInfosCommand.run(CacheAdmin.java:372) > at org.apache.hadoop.hdfs.tools.CacheAdmin.run(CacheAdmin.java:84) > at org.apache.hadoop.hdfs.tools.CacheAdmin.main(CacheAdmin.java:89) > {code} > After manually failing over from host-2 to host-1, the CacheAdmin commands succeed. > The affected commands are: > -listPools > -listDirectives > -removeDirectives -- This message was sent by Atlassian JIRA (v6.1#6144)