From user-return-12-apmail-hadoop-user-archive=hadoop.apache.org@hadoop.apache.org Tue Aug 7 12:41:12 2012 Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 917CF9EA1 for ; Tue, 7 Aug 2012 12:41:12 +0000 (UTC) Received: (qmail 76463 invoked by uid 500); 7 Aug 2012 12:41:07 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 76280 invoked by uid 500); 7 Aug 2012 12:41:06 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 76268 invoked by uid 99); 7 Aug 2012 12:41:06 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Aug 2012 12:41:06 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of michael_segel@hotmail.com designates 65.55.111.90 as permitted sender) Received: from [65.55.111.90] (HELO blu0-omc2-s15.blu0.hotmail.com) (65.55.111.90) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Aug 2012 12:41:00 +0000 Received: from BLU0-SMTP335 ([65.55.111.73]) by blu0-omc2-s15.blu0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Tue, 7 Aug 2012 05:40:39 -0700 X-Originating-IP: [173.15.87.38] X-Originating-Email: [michael_segel@hotmail.com] Message-ID: Received: from [10.1.10.10] ([173.15.87.38]) by BLU0-SMTP335.phx.gbl over TLS secured channel with Microsoft SMTPSVC(6.0.3790.4675); Tue, 7 Aug 2012 05:40:38 -0700 Content-Type: text/plain; charset="iso-8859-1" MIME-Version: 1.0 (Mac OS X Mail 6.0 \(1485\)) Subject: Re: Encrypting files in Hadoop - Using the io.compression.codecs From: Michael Segel In-Reply-To: Date: Tue, 7 Aug 2012 07:40:36 -0500 Content-Transfer-Encoding: quoted-printable References: To: user@hadoop.apache.org X-Mailer: Apple Mail (2.1485) X-OriginalArrivalTime: 07 Aug 2012 12:40:38.0729 (UTC) FILETIME=[D9402390:01CD7499] X-Virus-Checked: Checked by ClamAV on apache.org There is a bit of a difference between encryption and compression.=20 You're better off using coprocessors to encrypt the data as its being = written than trying to encrypt the actual HFile.=20 On Aug 7, 2012, at 3:31 AM, Harsh J wrote: > Farrokh, >=20 > I do not know of a way to plug in a codec that applies to all files on > HDFS transparently yet. Check out > https://issues.apache.org/jira/browse/HDFS-2542 and friends for some > work that may arrive in future. >=20 > For HBase, by default, your choices are limited. You get only what > HBase has tested to offer (None, LZO, GZ, Snappy) and adding in > support for a new codec requires modification of sources. This is > cause HBase uses an Enum of codec identifiers (to save space in its > HFiles). But yes it can be done, and there're hackier ways of doing > this too (Renaming your CryptoCodec to SnappyCodec for instance, to > have HBase unknowingly use it, ugly ugly ugly). > So yes, it is indeed best to discuss this need with the HBase > community than the Hadoop one here. >=20 > On Tue, Aug 7, 2012 at 1:43 PM, Farrokh Shahriari > wrote: >> Thanks, >> What if I want to use this encryption in a cluster with hbase running = on top >> of hadoop? Can't hadoop be configured to automatically encrypt each = file >> which is going to be written on it? >> If not I probably should be asking how to enable encryption on hbase, = and >> asking this question on the hbase mailing list, right? >>=20 >>=20 >> On Tue, Aug 7, 2012 at 12:32 PM, Harsh J wrote: >>>=20 >>> Farrokh, >>>=20 >>> The codec org.apache.hadoop.io.compress.crypto.CyptoCodec needs to = be >>> used. What you've done so far is merely add it to be loaded by = Hadoop >>> at runtime, but you will need to use it in your programs if you wish >>> for it to get applied. >>>=20 >>> For example, for MapReduce outputs to be compressed, you may run an = MR >>> job with the following option set on its configuration: >>>=20 >>>=20 >>> = "-Dmapred.output.compression.codec=3Dorg.apache.hadoop.io.compress.crypto.= CyptoCodec" >>>=20 >>> And then you can notice that your output files were all properly >>> encrypted with the above codec. >>>=20 >>> Likewise, if you're using direct HDFS writes, you will need to wrap >>> your outputstream with this codec. Look at the CompressionCodec API = to >>> see how: >>> = http://hadoop.apache.org/common/docs/stable/api/org/apache/hadoop/io/compr= ess/CompressionCodec.html#createOutputStream(java.io.OutputStream) >>> (Where your CompressionCodec must be the >>> org.apache.hadoop.io.compress.crypto.CyptoCodec instance). >>>=20 >>> On Tue, Aug 7, 2012 at 1:11 PM, Farrokh Shahriari >>> wrote: >>>>=20 >>>> Hello >>>> I use "Hadoop Crypto Compressor" from this >>>> site"https://github.com/geisbruch/HadoopCryptoCompressor" for = encryption >>>> hdfs files. >>>> I've downloaded the complete code & create the jar file,Change the >>>> propertise in core-site.xml as the site says. >>>> But when I add a new file,nothing has happened & encryption isn't >>>> working. >>>> What can I do for encryption hdfs files ? Does anyone know how I = should >>>> use this class ? >>>>=20 >>>> Tnx >>>=20 >>>=20 >>>=20 >>>=20 >>> -- >>> Harsh J >>=20 >>=20 >=20 >=20 >=20 > --=20 > Harsh J >=20