hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wei-Chiu Chuang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey
Date Tue, 09 May 2017 21:49:04 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16003619#comment-16003619
] 

Wei-Chiu Chuang commented on HDFS-11741:
----------------------------------------

[~zhz] [~shahrs87] mind to chime in on this observation?
{quote}
I just realized a client side BlockTokenSecretManager generates DataEncryptionKey expiration
time using now + token life time. I am not sure if that's intended, as I would have assumed
the key expiration time equals the current BlockKey expiration time (which is determined by
NameNode).

So it is entirely possible that balancer has an unexpired DataEncryptionKey, corresponding
to an expired BlockKey. When it talks to the other side, the expired BlockKey would fail the
connection. Therefore my rev 01 patch would not fix all the problems because of this mismatch.
{quote}

Thanks!

> Long running balancer may fail due to expired DataEncryptionKey
> ---------------------------------------------------------------
>
>                 Key: HDFS-11741
>                 URL: https://issues.apache.org/jira/browse/HDFS-11741
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: balancer & mover
>         Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. Balancer login
using keytab
>            Reporter: Wei-Chiu Chuang
>            Assignee: Wei-Chiu Chuang
>         Attachments: HDFS-11741.001.patch, HDFS-11741.002.patch, HDFS-11741.003.patch
>
>
> We found a long running balancer may fail despite using keytab, because KeyManager returns
expired DataEncryptionKey, and it throws the following exception:
> {noformat}
> 2017-04-30 05:03:58,661 WARN  [pool-1464-thread-10] balancer.Dispatcher (Dispatcher.java:dispatch(325))
- Failed to move blk_1067352712_3913241 with size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK
through 10.0.0.134:50010
> org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: Can't re-compute
encryption key for nonce, since the required block key (keyID=1005215027) doesn't exist. Current
key: 1005215030
>         at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183)
>         at org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311)
>         at org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182)
>         at org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This bug is similar in nature to HDFS-10609. While balancer KeyManager actively synchronizes
itself with NameNode w.r.t block keys, it does not update DataEncryptionKey accordingly.
> In a specific cluster, with Kerberos ticket life time 10 hours, and default block token
expiration/life time 10 hours, a long running balancer failed after 20~30 hours.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message