hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jimmy Xiang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5555) CacheAdmin commands fail when first listed NameNode is in Standby
Date Mon, 02 Dec 2013 23:52:36 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13837092#comment-13837092
] 

Jimmy Xiang commented on HDFS-5555:
-----------------------------------

DFSClient uses RetryInvocationHandler in this case while the iterator uses ProtobufRpcEngine#Invoker
which doesn't handle failover. One fix is to throw StandbyException at the beginning instead
of returning an iterator. The other fix is to make sure the iterator supports failover as
well.  If we throw  StandbyException at the beginning, the issue will comes up again when
iterating the iterator while NN fails over in the middle.

> CacheAdmin commands fail when first listed NameNode is in Standby
> -----------------------------------------------------------------
>
>                 Key: HDFS-5555
>                 URL: https://issues.apache.org/jira/browse/HDFS-5555
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: caching
>    Affects Versions: 3.0.0
>            Reporter: Stephen Chu
>            Assignee: Jimmy Xiang
>
> I am on a HA-enabled cluster. The NameNodes are on host-1 and host-2.
> In the configurations, we specify the host-1 NN first and the host-2 NN afterwards in
the _dfs.ha.namenodes.ns1_ property (where _ns1_ is the name of the nameservice).
> If the host-1 NN is Standby and the host-2 NN is Active, some CacheAdmins will fail complaining
about operation not supported in standby state.
> e.g.
> {code}
> bash-4.1$ hdfs cacheadmin -removeDirectives -path /user/hdfs2
> Exception in thread "main" org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
Operation category READ is not supported in state standby
> 	at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1501)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1082)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.listCacheDirectives(FSNamesystem.java:6892)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer$ServerSideCacheEntriesIterator.makeRequest(NameNodeRpcServer.java:1263)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer$ServerSideCacheEntriesIterator.makeRequest(NameNodeRpcServer.java:1249)
> 	at org.apache.hadoop.fs.BatchedRemoteIterator.makeRequest(BatchedRemoteIterator.java:77)
> 	at org.apache.hadoop.fs.BatchedRemoteIterator.makeRequestIfNeeded(BatchedRemoteIterator.java:85)
> 	at org.apache.hadoop.fs.BatchedRemoteIterator.hasNext(BatchedRemoteIterator.java:99)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.listCacheDirectives(ClientNamenodeProtocolServerSideTranslatorPB.java:1087)
> 	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1499)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1348)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1301)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> 	at com.sun.proxy.$Proxy9.listCacheDirectives(Unknown Source)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB$CacheEntriesIterator.makeRequest(ClientNamenodeProtocolTranslatorPB.java:1079)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB$CacheEntriesIterator.makeRequest(ClientNamenodeProtocolTranslatorPB.java:1064)
> 	at org.apache.hadoop.fs.BatchedRemoteIterator.makeRequest(BatchedRemoteIterator.java:77)
> 	at org.apache.hadoop.fs.BatchedRemoteIterator.makeRequestIfNeeded(BatchedRemoteIterator.java:85)
> 	at org.apache.hadoop.fs.BatchedRemoteIterator.hasNext(BatchedRemoteIterator.java:99)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem$32.hasNext(DistributedFileSystem.java:1704)
> 	at org.apache.hadoop.hdfs.tools.CacheAdmin$RemoveCacheDirectiveInfosCommand.run(CacheAdmin.java:372)
> 	at org.apache.hadoop.hdfs.tools.CacheAdmin.run(CacheAdmin.java:84)
> 	at org.apache.hadoop.hdfs.tools.CacheAdmin.main(CacheAdmin.java:89)
> {code}
> After manually failing over from host-2 to host-1, the CacheAdmin commands succeed.
> The affected commands are:
> -listPools
> -listDirectives
> -removeDirectives



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message