hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jing Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5322) HDFS delegation token not found in cache errors seen on secure HA clusters
Date Thu, 10 Oct 2013 16:57:45 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791674#comment-13791674

Jing Zhao commented on HDFS-5322:

bq. Again, the basic question driving this change is why FSNamesystem#checkOperation(OperationCategory.WRITE)
is not throwing during a transition to active?

During the transition (Standby -> Active), the current code first sets the state of the
NN to Active, then starts the active service, during which the NN still needs to tail the
remaining editlog. If a delegation token is contained in that last part of editlog, 1) FSNamesystem#checkOperation(OperationCategory.WRITE)
will not throw anything since the NN's state has already been changed to Active, 2) the new
ANN cannot find the token in its cache since it has not finished applying the editlog. We
should allow clients to retry since after NN finishes reading the editlog the delegation token
can be recognized.

In the meanwhile, if we let the NN first start active service, then change its state to standby,
your original hack in HADOOP-9880 can work, since a standbyexception will be thrown. But this
change will 1) extend the failover time, and 2) trigger unnecessary client failover. And I'm
not sure if this will break other code.

> HDFS delegation token not found in cache errors seen on secure HA clusters
> --------------------------------------------------------------------------
>                 Key: HDFS-5322
>                 URL: https://issues.apache.org/jira/browse/HDFS-5322
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha
>    Affects Versions: 2.1.1-beta
>            Reporter: Arpit Gupta
>            Assignee: Jing Zhao
>         Attachments: HDFS-5322.000.patch, HDFS-5322.000.patch, HDFS-5322.001.patch, HDFS-5322.002.patch,
HDFS-5322.003.patch, HDFS-5322.004.patch
> While running HA tests we have seen issues were we see HDFS delegation token not found
in cache errors causing jobs running to fail.
> {code}
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
> |2013-10-06 20:14:51,193 INFO  [main] mapreduce.Job: Task Id : attempt_1381090351344_0001_m_000007_0,
Status : FAILED
> Error: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
token (HDFS_DELEGATION_TOKEN token 11 for hrt_qa) can't be found in cache
> at org.apache.hadoop.ipc.Client.call(Client.java:1347)
> at org.apache.hadoop.ipc.Client.call(Client.java:1300)
> at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> at com.sun.proxy.$Proxy10.getBlockLocations(Unknown Source)
> {code}

This message was sent by Atlassian JIRA

View raw message