hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
Date Mon, 02 Jun 2014 22:47:02 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14016002#comment-14016002

Yongjun Zhang commented on HDFS-6475:

Hi Jing,

Thanks a lot for the info!  I took a quick look, the issue is similar but seems there is an
important difference here. That is,
In HDFS-5322 fix, the method (and all caller hierarchy)
  private void saslProcess(RpcSaslProto saslMessage)
        throws WrappedRpcServerException, IOException, InterruptedException {
is allowed to throw IOException, so your HDFS-5322 solution work well. 

For HDFS-6475, the involved class UserProvider  is not allowed to throw IOException. In fact,

UserProvider is only throwing unchecked exception, e.g., SecurityException 
here to include the StandbyException info in the message and cause:
/** Inject user information to http operations. */
public class UserProvider
    extends AbstractHttpContextInjectable<UserGroupInformation>
    implements InjectableProvider<Context, Type> {
  @Context HttpServletRequest request;
  @Context ServletContext servletcontext;
  public UserGroupInformation getValue(final HttpContext context) {
    final Configuration conf = (Configuration) servletcontext
    try {
      return JspHelper.getUGI(servletcontext, request, conf,
          AuthenticationMethod.KERBEROS, false);
    } catch (IOException e) {
      throw new SecurityException(
          "Failed to obtain user group information: " + e, e);

This means we can't throw StandbyException (inherited from IOException) from here.
So my uploaded patch tried to parse the message string of the SecurityException thrown

UserProvider class inherits from classes of jersey package, which we won't be able 
to change the interface spec.

We might be able to change the client/server interface: we detect this kind of case
at the interface, then instead of throwing the RemoteException that wraps SecurityException,
we throw RemoteException that wraps the cause of StandbyException. I'n not sure whether
we should go this route though. 

Would you please comment again? thanks.

> WebHdfs clients fail without retry because incorrect handling of StandbyException
> ---------------------------------------------------------------------------------
>                 Key: HDFS-6475
>                 URL: https://issues.apache.org/jira/browse/HDFS-6475
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha, webhdfs
>    Affects Versions: 2.4.0
>            Reporter: Yongjun Zhang
>            Assignee: Yongjun Zhang
>         Attachments: HDFS-6475.001.patch
> With WebHdfs clients connected to a HA HDFS service, the delegation token is previously
initialized with the active NN.
> When clients try to issue request, the NN it contacts is stored in a map returned by
DFSUtil.getNNServiceRpcAddresses(conf). And the client contact the NN based on the order,
so likely the first one it runs into is StandbyNN. If the StandbyNN doesn't have the updated
client crediential, it will throw a s SecurityException that wraps StandbyException.
> The client is expected to retry another NN, but due to the insufficient handling of SecurityException
mentioned above, it failed.
> Example message:
> {code}
> {RemoteException={message=Failed to obtain user group information: org.apache.hadoop.security.token.SecretManager$InvalidToken:
StandbyException, javaCl
> assName=java.lang.SecurityException, exception=SecurityException}}
> org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to obtain
user group information: org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException
>         at org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159)
>         at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325)
>         at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107)
>         at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635)
>         at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542)
>         at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431)
>         at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685)
>         at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696)
>         at kclient1.kclient$1.run(kclient.java:64)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:356)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528)
>         at kclient1.kclient.main(kclient.java:58)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> {code}

This message was sent by Atlassian JIRA

View raw message