hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From java8964 java8964 <java8...@hotmail.com>
Subject RE: Encryption in HDFS
Date Tue, 26 Feb 2013 19:52:09 GMT

I am also interested in your research. Can you share some insight about the following questions?
1) When you use CompressionCodec, can the encrypted file split? From my understand, there
is no encrypt way can make the file decryption individually by block, right?  For example,
if I have 1G file, encrypted using AES, how do you or can you decrypt the file block by block,
instead of just using one mapper to decrypt the whole file?
2) In your CompressionCodec implementation, do you use the DecompressorStream or BlockDecompressorStream?
If BlockDecompressorStream, can you share some examples? Right now, I have some problems to
use BlockDecompressorStream to do the exactly same thing as you did.3) Do you have any plan
to share your code, especially if you did use BlockDecompressorStream and make the encryption
file decrypted block by block in the hadoop MapReduce job.
From: renderaid@gmail.com
Date: Tue, 26 Feb 2013 14:10:08 +0900
Subject: Encryption in HDFS
To: user@hadoop.apache.org

Hello, I'm a university student.
I implemented AES and Triple DES with CompressionCodec in java cryptography architecture (JCA)The
encryption is performed by a client node using Hadoop API.

Map tasks read blocks from HDFS and these blocks are decrypted by each map tasks.I tested
my implementation with generic HDFS. My cluster consists of 3 nodes (1 master node, 3 worker
nodes) and each machines have quad core processor (i7-2600) and 4GB memory. 

A test input is 1TB text file which consists of 32 multiple text files (1 text file is 32GB)
I expected that the encryption takes much more time than generic HDFS. The performance does
not differ significantly. 

The decryption step takes about 5-7% more than generic HDFS. The encryption step takes about
20-30% more than generic HDFS because it is implemented by single thread and executed by 1
client node. 

So the encryption can get more performance. 
May there be any error in my test?
I know there are several implementation for encryting files in HDFS. Are these implementations
enough to secure HDFS?

best regards,
* Sorry for my bad english  		 	   		  
View raw message