hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen Liang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-13541) NameNode Port based selective encryption
Date Thu, 10 May 2018 17:23:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-13541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16470781#comment-16470781

Chen Liang commented on HDFS-13541:

Thanks a lot for taking a look [~benoyantony]! 
bq. We could also think of passing additional parameters based on the connection
This is a good point. The first thought I had is to pass the entire Server#Connection instance
to the resolver, (it also includes the IP address and the ingress port). Other than these,
other potentially relevant fields might include user, remote, hostAddress (not sure if hostAddress
is set correctly at all though). What do you think?

bq. If so, what prevents the external  client from replaying the encrypted message from a
different connection between an internal client and datanode ?
As of now, the main thing that prevents replay attach is the fact that the key expires after
10 min by default. After the key expires, NN and DN will be using different keys and the encrypted
message is invalidated. Namely, the attacher has a maximum of 10 min window to reply the encrypted
message. We consider this sufficient as for now. If I understood this correctly, I think it
is the same rationale behind block access token. i.e. without talking to NN, someone may connect
to DN directly replaying block access token, but only possible in that 10 min window. 

We considered adding more identification info addition to the QOP string, such as client IP,
or some timestamp based info. This adds more variable to the message itself. But that also
adds more encryption overhead (because the message is larger). Also, adding IP address might
be relatively straightforward, other info such as timestamp may be very tricky to manage here.
Currently we are inclined not to go this with optimization. Comments on this?

bq. Another side effect of derived QOP for data transfer protection is that one cannot enable
RPC protection alone with this approach.
This is true as in my current POC. Because in our environment NN and DN always do the same
protection. But we can add configuration's to allow only enforce RPC protection. We just need
to be able to configure DN to ignore the derived QOP.

bq. As mentioned in the document, Encrypting the entire data pipeline is not necessary. I
believe, it should be optimized
Sure, will work on that.

bq. I prefer the approach where datanode also listens on two ports, as it makes the entire
approach easy to understand
On the implementation complexity, it means we will need to change NN-DN communication such
that DN informs NN about the new port it has. The DN maintenance code logic seems a bit convoluted
now; on the practical side, in our environment, cross data center traffic actually compose
a small fraction of all traffic, having additional DataXceiverServer thread sitting and listening
on every single datanode, but being idle most of time does not seem to be ideal. [~shv] may
have more comments on this, he is on vacation and until next week. In the mean time, I will
re-evaluate my other POC patch on this alternative approach.

> NameNode Port based selective encryption
> ----------------------------------------
>                 Key: HDFS-13541
>                 URL: https://issues.apache.org/jira/browse/HDFS-13541
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, namenode, security
>            Reporter: Chen Liang
>            Assignee: Chen Liang
>            Priority: Major
>         Attachments: NameNode Port based selective encryption-v1.pdf
> Here at LinkedIn, one issue we face is that we need to enforce different security requirement
based on the location of client and the cluster. Specifically, for clients from outside
of the data center, it is required by regulation that all traffic must be encrypted. But for
clients within the same data center, unencrypted connections are more desired to avoid the
high encryption overhead. 
> HADOOP-10221 introduced pluggable SASL resolver, based on which HADOOP-10335 introduced
WhitelistBasedResolver which solves the same problem. However we found it difficult to fit
into our environment for several reasons. In this JIRA, on top of pluggable SASL resolver,
*we propose a different approach of running RPC two ports on NameNode, and the two ports will
be enforcing encrypted and unencrypted connections respectively, and the following DataNode
access will simply follow the same behaviour of encryption/unencryption*. Then by blocking
unencrypted port on datacenter firewall, we can completely block unencrypted external access.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message