hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Íñigo Goiri (JIRA) <j...@apache.org>
Subject [jira] [Commented] (HDFS-14230) RBF: Throw RetriableException instead of IOException when no namenodes available
Date Fri, 01 Feb 2019 17:48:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-14230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758534#comment-16758534
] 

Íñigo Goiri commented on HDFS-14230:
------------------------------------

[^HDFS-14230-HDFS-13891.005.patch] LGTM.
+1 pending Yetus.
Anybody else wants to take a look at it?

> RBF: Throw RetriableException instead of IOException when no namenodes available
> --------------------------------------------------------------------------------
>
>                 Key: HDFS-14230
>                 URL: https://issues.apache.org/jira/browse/HDFS-14230
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.2.0, 3.1.1, 2.9.2, 3.0.3
>            Reporter: Fei Hui
>            Assignee: Fei Hui
>            Priority: Major
>         Attachments: HDFS-14230-HDFS-13891.001.patch, HDFS-14230-HDFS-13891.002.patch,
HDFS-14230-HDFS-13891.003.patch, HDFS-14230-HDFS-13891.004.patch, HDFS-14230-HDFS-13891.005.patch
>
>
> Failover usually happens when upgrading namenodes. And there are no active namenodes
within some seconds, Accessing HDFS through router fails at this moment. This could make jobs
 failure or hang. Some hive jobs logs are as follow  
> {code:java}
> 2019-01-03 16:12:08,337 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 133.33 sec
> MapReduce Total cumulative CPU time: 2 minutes 13 seconds 330 msec
> Ended Job = job_1542178952162_24411913
> Launching Job 4 out of 6
> Exception in thread "Thread-86" java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(java.io.IOException):
No namenode available under nameservice Cluster3
>     at org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.shouldRetry(RouterRpcClient.java:328)
>     at org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invoke(RouterRpcClient.java:488)
>     at org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invoke(RouterRpcClient.java:495)
>     at org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:385)
>     at org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:760)
>     at org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152)
>     at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849)
>     at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>     at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867)
>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130)
> Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
Operation category READ is not supported in state standby
>     at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)
>     at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1804)
>     at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1338)
>     at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3925)
>     at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1014)
>     at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849)
>     at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>     at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867)
>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130)
> {code}
> Deep into the code. Maybe we can throw StandbyException when no namenodes available.
Client will fail after some retries



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message