hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yi Liu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5143) Hadoop cryptographic file system
Date Thu, 05 Sep 2013 00:54:53 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758596#comment-13758596

Yi Liu commented on HDFS-5143:

Steve, Thanks for your comments. 
>>> Is there going to be a difference between the listable length of a file (FileSystem.listStatus(),
and the user-code visible length of a file

The user will see no difference between these two in our design choice, and they will be the
same length as original file. 

As you know, for most encryption modes of various encryption algorithms, the length of cipher
text is different from the length of original plain text.  But in our design, the length of
cipher text is the same length as plain text, more importantly, the bytes have 1:1 correspondence

To make the encryption more secure, we use different IV(Initialization Vector) in encryption
algorithm, and IV is fixed size of 16bytes. We store the IV at the header of encrypted file,
so Length of encrypted file = Length of original file + 16 bytes. However, we will implement
listStatus/getFileStatus and other related interfaces of FileSystem in CFS to ensure the length
returned is always the original length of the file.

The key point is that length of encrypted file equals length of plain text file + 16bytes,
the bytes have 1:1 correspondence, and our design allows a random access property during decryption.
So we can easily get the length of plain text file and easily handle other operations of file
Actually, if we put “encryption” flag and IV in namenode, then length of encrypted file
equals to length of plain text file. That will be great for HDFS, but many people may not
like the idea of modification to namenode inodes and code. Furthermore, CFS can decorate other
file system besides HDFS, so we are proposing not to modify structure of namenode.

>>> Is it that the cfs:// view is consistent across all file stat operations, seek()

Right, it’s consistent. They are regard to plain text file, since upper layer applications
should be unaware of encryption which is transparent. 
Furthermore, for du, df and other related commands of file system, since Length of encrypted
file = Length of original file + 16bytes, “du” will count the plain text file size, and
it’s consistent with the file size listed in “ls”, but “df” e.g. will count the
encrypted file size.
>>> I’m curious about how this interacts with quotas.

This is a good question. HDFS Quotas includes Name Quotas and Space Quotas. We just need to
discuss Space Quotas, as described above, length of encrypted file equals length of plain
text file + 16 bytes, so the required space of encrypted directory is a bit larger than unencrypted
directory, but I don’t think this affects usage, when copying a file from unencrypted directory
to an encrypted one, if space quotas is not enough and the copying directory contains encrypted
file, we will prompt with a message like “The directory contains encrypted file, since 16
additional bytes are required per encrypted file, the space quota for the target directory
is insufficient”.
>>> Are all operations that are atomic today, e.g. renaming one directory under another
going to remain atomic?

It depends.  If renaming one directory under another, and both the source and target are unencrypted
directory, then the operations are still atomic. However, we do not intend to allow renaming
an unencrypted directory to encrypted one, instead, user should create the encrypted directory
first and then copy files to it.
> Hadoop cryptographic file system
> --------------------------------
>                 Key: HDFS-5143
>                 URL: https://issues.apache.org/jira/browse/HDFS-5143
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: security
>    Affects Versions: 3.0.0
>            Reporter: Yi Liu
>              Labels: rhino
>             Fix For: 3.0.0
>         Attachments: HADOOP cryptographic file system.pdf
> There is an increasing need for securing data when Hadoop customers use various upper
layer applications, such as Map-Reduce, Hive, Pig, HBase and so on.
> HADOOP CFS (HADOOP Cryptographic File System) is used to secure data, based on HADOOP
“FilterFileSystem” decorating DFS or other file systems, and transparent to upper layer
applications. It’s configurable, scalable and fast.
> High level requirements:
> 1.	Transparent to and no modification required for upper layer applications.
> 2.	“Seek”, “PositionedReadable” are supported for input stream of CFS if the
wrapped file system supports them.
> 3.	Very high performance for encryption and decryption, they will not become bottleneck.
> 4.	Can decorate HDFS and all other file systems in Hadoop, and will not modify existing
structure of file system, such as namenode and datanode structure if the wrapped file system
is HDFS.
> 5.	Admin can configure encryption policies, such as which directory will be encrypted.
> 6.	A robust key management framework.
> 7.	Support Pread and append operations if the wrapped file system supports them.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message