Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0B353EE61 for ; Tue, 26 Feb 2013 19:52:44 +0000 (UTC) Received: (qmail 46577 invoked by uid 500); 26 Feb 2013 19:52:39 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 46467 invoked by uid 500); 26 Feb 2013 19:52:38 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 46459 invoked by uid 99); 26 Feb 2013 19:52:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Feb 2013 19:52:38 +0000 X-ASF-Spam-Status: No, hits=2.4 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of java8964@hotmail.com designates 65.55.111.82 as permitted sender) Received: from [65.55.111.82] (HELO blu0-omc2-s7.blu0.hotmail.com) (65.55.111.82) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Feb 2013 19:52:30 +0000 Received: from BLU162-W10 ([65.55.111.71]) by blu0-omc2-s7.blu0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Tue, 26 Feb 2013 11:52:10 -0800 X-EIP: [Oe6Byg2f7wSf1nklaKSDWGwiOvaHm3Mo] X-Originating-Email: [java8964@hotmail.com] Message-ID: Content-Type: multipart/alternative; boundary="_26c6df2f-d8fa-4652-8aaf-32d3b0638bc5_" From: java8964 java8964 To: Subject: RE: Encryption in HDFS Date: Tue, 26 Feb 2013 14:52:09 -0500 Importance: Normal In-Reply-To: References: MIME-Version: 1.0 X-OriginalArrivalTime: 26 Feb 2013 19:52:10.0149 (UTC) FILETIME=[C3988D50:01CE145A] X-Virus-Checked: Checked by ClamAV on apache.org --_26c6df2f-d8fa-4652-8aaf-32d3b0638bc5_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable I am also interested in your research. Can you share some insight about the= following questions? 1) When you use CompressionCodec=2C can the encrypted file split? From my u= nderstand=2C there is no encrypt way can make the file decryption individua= lly by block=2C right? For example=2C if I have 1G file=2C encrypted using= AES=2C how do you or can you decrypt the file block by block=2C instead of= just using one mapper to decrypt the whole file? 2) In your CompressionCodec implementation=2C do you use the DecompressorSt= ream or BlockDecompressorStream? If BlockDecompressorStream=2C can you shar= e some examples? Right now=2C I have some problems to use BlockDecompressor= Stream to do the exactly same thing as you did.3) Do you have any plan to s= hare your code=2C especially if you did use BlockDecompressorStream and mak= e the encryption file decrypted block by block in the hadoop MapReduce job. Thanks Yong From: renderaid@gmail.com Date: Tue=2C 26 Feb 2013 14:10:08 +0900 Subject: Encryption in HDFS To: user@hadoop.apache.org Hello=2C I'm a university student. I implemented AES and Triple DES with CompressionCodec in java cryptography= architecture (JCA)The encryption is performed by a client node using Hadoo= p API. Map tasks read blocks from HDFS and these blocks are decrypted by each map = tasks.I tested my implementation with generic HDFS. My cluster consists of = 3 nodes (1 master node=2C 3 worker nodes) and each machines have quad core = processor (i7-2600) and 4GB memory.=20 A test input is 1TB text file which consists of 32 multiple text files (1 t= ext file is 32GB) I expected that the encryption takes much more time than generic HDFS. The = performance does not differ significantly.=20 The decryption step takes about 5-7% more than generic HDFS. The encryption= step takes about 20-30% more than generic HDFS because it is implemented b= y single thread and executed by 1 client node.=20 So the encryption can get more performance.=20 May there be any error in my test? I know there are several implementation for encryting files in HDFS. Are th= ese implementations enough to secure HDFS? best regards=2C seonpark * Sorry for my bad english = --_26c6df2f-d8fa-4652-8aaf-32d3b0638bc5_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
I am also interested in your research. Can you share some insight about the= following questions?

1) When you use =3BCompression= Codec=2C can the encrypted file split? From my understand=2C there is no en= crypt way can make the file =3Bdecryption =3Bindividually =3Bby= block=2C right? =3B =3BFor example=2C if I have 1G file=2C encrypt= ed using AES=2C how do you or can you =3Bdecrypt =3Bthe file block = by block=2C instead of just using one mapper to decrypt the whole file?
= 2) In your =3BCompressionCodec implementation=2C do you use the =3B= DecompressorStream or =3BBlockDecomp= ressorStream? If =3BBlockDecompressorStream=2C can you share som= e examples? Right now=2C I have some problems to use =3BBlockDecompress= orStream to do the exactly same thing as you did.
3) Do you have = any plan to share your code=2C especially if you did use BlockDecompressorS= tream and make the encryption file decrypted block by block in the hadoop M= apReduce job.

Thanks

Yong=


From: renderaid@gmail.com
Date: Tue=2C 26 Feb 2013 14:10:08 +0900=
Subject: Encryption in HDFS
To: user@hadoop.apache.org

H= ello=2C I'm a university student.

I implemented AE= S and Triple DES with CompressionCodec in java cryptography architecture (J= CA)
The encryption is performed by a client node using Hadoop API= .
Map tasks read blocks from HDFS and these blocks are decrypted by each= map tasks.
I tested my implementation with generic HDFS. =3B=
My cluster consists of 3 nodes (1 master node=2C 3 worker nodes)= and each machines have quad core processor (i7-2600) and 4GB memory. = =3B
A test input is 1TB text file which consists of 32 multiple text files= (1 text file is 32GB)

I expected that the encrypt= ion takes much more time than generic HDFS. =3B
The performan= ce does not differ significantly. =3B
The decryption step takes about 5-7% more than generic HDFS. =3B
The encryption step takes about 20-30% more than generic HDFS beca= use it is implemented by single thread and executed by 1 client node. = =3B
So the encryption can get more performance. =3B

May there be any error in my test?

I know there = are several implementation for encryting files in HDFS. =3B
A= re these implementations enough to secure HDFS?

best regards=2C

seonpark
=

* Sorry for my bad english =3B


<= /div>
= --_26c6df2f-d8fa-4652-8aaf-32d3b0638bc5_--