hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6134) Transparent data at rest encryption
Date Tue, 20 May 2014 01:52:40 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002704#comment-14002704

Todd Lipcon commented on HDFS-6134:

I think two of Owen's questions may not be addressed in the docs. I'll do my best to answer
them here:

bq. For release in the Hadoop 2.x line, you need to preserve both forward and backwards wire
compatibility. How do you plan to address that?

For data which has been marked encrypted, we obviously can't provide backwards-compatibility.
I think the most sane behavior is probably that, if an old client tries to access encrypted
data, they should receive the ciphertext instead of the decrypted plaintext. Another option
might be to return an error. Either would be achievable by having the new client provide some
flag in the OP_READ_BLOCK request which indicates "I am reading encrypted data and I am aware
of it." If the new server sees that a client is reading encrypted data and does _not_ have
that flag, it could respond appropriately with either of the above two options.

A new client accessing an old cluster should not be problematic, as we would only add new
fields to RPCs. The NN RPCs to set up encryption zones, etc, would fail with the usual "not
implemented" type exceptions (same as any other new feature).

bq. It seems that the additional datanode and client complexity is prohibitive. Making changes
to the HDFS write and read pipeline is extremely touchy.

I think prohibitive is a strong word. Adding new features may add complexity, but per the
design docs that Alejandro pointed to, we think the advantages are worth it. There are several
experienced HDFS developers working on this branch (alongside with the newer folks) so you
can be sure we understand the areas of code being worked on and the associated risks. Having
done much of the work required to support the checksum type changeover in Hadoop 2, I feel
it's pretty likely the complexity of encryption is actually less than that project.

> Transparent data at rest encryption
> -----------------------------------
>                 Key: HDFS-6134
>                 URL: https://issues.apache.org/jira/browse/HDFS-6134
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: security
>    Affects Versions: 2.3.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>         Attachments: HDFSDataAtRestEncryption.pdf
> Because of privacy and security regulations, for many industries, sensitive data at rest
must be in encrypted form. For example: the health­care industry (HIPAA regulations), the
card payment industry (PCI DSS regulations) or the US government (FISMA regulations).
> This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently
by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library,
> The resulting implementation should be able to be used in compliance with different regulation

This message was sent by Atlassian JIRA

View raw message