hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey
Date Tue, 02 May 2017 21:54:04 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15993858#comment-15993858
] 

Andrew Wang commented on HDFS-11741:
------------------------------------

Hi Wei-chiu, patch looks good overall, few quick questions:

* For other tokens, we renew before the token is expired, e.g. after half the token lifetime
has elapsed. This handles clock skew and TOCTOU issues. Should we do this here too?
* Is it possible to write a unit test using a FakeTimer rather than using Thread.sleep?
* Test is using the JUnit 3 assert, please use JUnit 4's asserts instead.

> Long running balancer may fail due to expired DataEncryptionKey
> ---------------------------------------------------------------
>
>                 Key: HDFS-11741
>                 URL: https://issues.apache.org/jira/browse/HDFS-11741
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: balancer & mover
>         Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. Balancer login
using keytab
>            Reporter: Wei-Chiu Chuang
>            Assignee: Wei-Chiu Chuang
>         Attachments: HDFS-11741.001.patch
>
>
> We found a long running balancer may fail despite using keytab, because KeyManager returns
expired DataEncryptionKey, and it throws the following exception:
> {noformat}
> 2017-04-30 05:03:58,661 WARN  [pool-1464-thread-10] balancer.Dispatcher (Dispatcher.java:dispatch(325))
- Failed to move blk_1067352712_3913241 with size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK
through 10.0.0.134:50010
> org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: Can't re-compute
encryption key for nonce, since the required block key (keyID=1005215027) doesn't exist. Current
key: 1005215030
>         at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183)
>         at org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311)
>         at org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182)
>         at org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This bug is similar in nature to HDFS-10609. While balancer KeyManager actively synchronizes
itself with NameNode w.r.t block keys, it does not update DataEncryptionKey accordingly.
> In a specific cluster, with Kerberos ticket life time 10 hours, and default block token
expiration/life time 10 hours, a long running balancer failed after 20~30 hours.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message