hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3637) Add support for encrypting the DataTransferProtocol
Date Tue, 07 Aug 2012 06:12:04 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429961#comment-13429961

Aaron T. Myers commented on HDFS-3637:

Thanks a lot for the very thorough review, Eli. Updated patch incoming.

bq. Testing?

In addition to the included automated tests, I've tested this on a 4-node cluster, reading
and writing files, running MR jobs (tera gens/sorts), etc. I've seen no issues.

bq. What's the latest performance slowdown for the basic HDFS read/write path with RC4 enabled?

I haven't done a really thorough benchmark, but my testing indicates about a 1.8-2.2x slowdown
with RC4, and a much higher slowdown with 3DES. I think this description of the relative speed
of cipher algorithms in Java is pretty accurate:


bq. Seems like DFSOutputStream#newBlockReader in the conf.useLegacyBlockReader conditional
should use a precondition or throw an RTE (eg AssertionError) if encryptionKey is null, otherwise
the client will just consider this a dead DN and keep trying.

Good point. Changed to a RuntimeException.

bq. In the other case it should blow up if encryptionKey is null right, otherwise we can have
it enabled server side but allow a client not to use it?

Not quite sure what you mean by this. In which case should we blow up if encryptionKey is
null? Note that the client will never be allowed to not use encryption if the DN is configured
to use it. The error message won't be nice, but no data will ever be transmitted in the clear.

bq. The dfs.encrypt.data.transfer description that this is a server-side config


bq. Add dfs.encrypt.data.transfer.algorithm with out a default and list two supported values?

Added the following:
    This value may be set to either "3des" or "rc4". If nothing is set, then
    the configured JCE default on the system is used (usually 3DES.) It is
    widely believed that 3DES is more cryptographically secure, but RC4 is
    substantially faster.

bq. Shouldn't shouldEncryptData throw an exception if server defaults is null instead of assume
it shouldn't encrypt? Seems more secure, eg if we ever introduce a bug that results in the
NN returning a null server default (should never happen currently).

No, for compatibility purposes. With the current implementation, an upgraded client talking
to an older server (without encryption support) will correctly conclude that it does not need
to encrypt data. Again, if we ever were to introduce a bug like you describe, nothing would
be sent in the clear, and the client would blow up eventually.

bq. Consider pulling out the block manager not setting the block pool ID bug to a separate

Sorry, it's not a bug. It's because I changed BlockTokenSecretManager to take the BlockPoolId
at creation time, instead of every time a BlockToken is created. This is a reasonable change
to make since a single BlockTokenSecretManager cannot actually issue valid BlockTokens for
anything but a single BlockPoolId. Sorry, I should have mentioned this change in my description
of the patch.


This wouldn't be right, since we've lowered the key update interval and token lifetime earlier
in the test. It also needs to be a few multiples of the block token lifetime, since several
block tokens are valid at any given time (the current and the last two, by default.)

bq. Also perhaps update the relevant NN java doc to indicate that "getting" the key generates
a new key with this timeout.

I called it "getEncryptionKey" to be in keeping with "getDelegationToken". More appropriate
for these would probably be "generate" instead of "get". What are your thoughts on this?

bq. Jira for supporting encryption or remove this TODO?

Well, since we're sort of phasing out support for RemoteBlockReader, I doubt such a JIRA will
actually ever be implemented. Perhaps we should just remove the TODO?

bq. Are the sendReadResult write timeout and DFSOutputStream#flush a separate issue or something
introduced here?

It's no functional change - just a refactor so that RemoteBlockReader2#writeReadResult takes
a stream as an argument, instead of always creating a new stream from the given socket.
> Add support for encrypting the DataTransferProtocol
> ---------------------------------------------------
>                 Key: HDFS-3637
>                 URL: https://issues.apache.org/jira/browse/HDFS-3637
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: data-node, hdfs client, security
>    Affects Versions: 2.0.0-alpha
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>         Attachments: HDFS-3637.patch, HDFS-3637.patch, HDFS-3637.patch
> Currently all HDFS RPCs performed by NNs/DNs/clients can be optionally encrypted. However,
actual data read or written between DNs and clients (or DNs to DNs) is sent in the clear.
When processing sensitive data on a shared cluster, confidentiality of the data read/written
from/to HDFS may be desired.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message