hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
Date Sun, 22 Jun 2014 21:47:24 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14040264#comment-14040264
] 

Yongjun Zhang commented on HDFS-6475:
-------------------------------------

HI [~daryn], [~jingzhao] and [~atm],

Many thanks to you guys for earlier review and comments. I just uploaded a new revision (008)
to address the comments and testing errors.
In summary,

- Per Daryn's suggestion, I attempted to remove getTrueCause() method from Server.java as
a whole, ran into test failures. After spending quite some time to look into, I personally
really removing the existing getTrueCauseMethod really deserves a new JIRA, I filed HDFS-6588
with details and questions. I hope you'd agree based on the information I provided there,
but I'm certainly open for further discussion.

- The new patch I just uploaded for HDFS-6475 is limited to handle the case reported in this
JIRA. It is only a few lines in ExceptionHandler.java, plus the testcase I added. It no longer
calls getTrueCause() method defined in Server.java.

Thanks a lot for follow-up.


> WebHdfs clients fail without retry because incorrect handling of StandbyException
> ---------------------------------------------------------------------------------
>
>                 Key: HDFS-6475
>                 URL: https://issues.apache.org/jira/browse/HDFS-6475
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha, webhdfs
>    Affects Versions: 2.4.0
>            Reporter: Yongjun Zhang
>            Assignee: Yongjun Zhang
>         Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, HDFS-6475.003.patch, HDFS-6475.003.patch,
HDFS-6475.004.patch, HDFS-6475.005.patch, HDFS-6475.006.patch, HDFS-6475.007.patch, HDFS-6475.008.patch
>
>
> With WebHdfs clients connected to a HA HDFS service, the delegation token is previously
initialized with the active NN.
> When clients try to issue request, the NN it contacts is stored in a map returned by
DFSUtil.getNNServiceRpcAddresses(conf). And the client contact the NN based on the order,
so likely the first one it runs into is StandbyNN. If the StandbyNN doesn't have the updated
client crediential, it will throw a s SecurityException that wraps StandbyException.
> The client is expected to retry another NN, but due to the insufficient handling of SecurityException
mentioned above, it failed.
> Example message:
> {code}
> {RemoteException={message=Failed to obtain user group information: org.apache.hadoop.security.token.SecretManager$InvalidToken:
StandbyException, javaCl
> assName=java.lang.SecurityException, exception=SecurityException}}
> org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to obtain
user group information: org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException
>         at org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159)
>         at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325)
>         at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107)
>         at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635)
>         at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542)
>         at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431)
>         at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685)
>         at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696)
>         at kclient1.kclient$1.run(kclient.java:64)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:356)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528)
>         at kclient1.kclient.main(kclient.java:58)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message