hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: Encrypting files in Hadoop - Using the io.compression.codecs
Date Tue, 07 Aug 2012 12:40:36 GMT
There is a bit of a difference between encryption and compression. 

You're better off using coprocessors to encrypt the data as its being written than trying
to encrypt the actual HFile. 

On Aug 7, 2012, at 3:31 AM, Harsh J <harsh@cloudera.com> wrote:

> Farrokh,
> I do not know of a way to plug in a codec that applies to all files on
> HDFS transparently yet. Check out
> https://issues.apache.org/jira/browse/HDFS-2542 and friends for some
> work that may arrive in future.
> For HBase, by default, your choices are limited. You get only what
> HBase has tested to offer (None, LZO, GZ, Snappy) and adding in
> support for a new codec requires modification of sources. This is
> cause HBase uses an Enum of codec identifiers (to save space in its
> HFiles). But yes it can be done, and there're hackier ways of doing
> this too (Renaming your CryptoCodec to SnappyCodec for instance, to
> have HBase unknowingly use it, ugly ugly ugly).
> So yes, it is indeed best to discuss this need with the HBase
> community than the Hadoop one here.
> On Tue, Aug 7, 2012 at 1:43 PM, Farrokh Shahriari
> <mohandes.zebeleh.67@gmail.com> wrote:
>> Thanks,
>> What if I want to use this encryption in a cluster with hbase running on top
>> of hadoop? Can't hadoop be configured to automatically encrypt each file
>> which is going to be written on it?
>> If not I probably should be asking how to enable encryption on hbase, and
>> asking this question on the hbase mailing list, right?
>> On Tue, Aug 7, 2012 at 12:32 PM, Harsh J <harsh@cloudera.com> wrote:
>>> Farrokh,
>>> The codec org.apache.hadoop.io.compress.crypto.CyptoCodec needs to be
>>> used. What you've done so far is merely add it to be loaded by Hadoop
>>> at runtime, but you will need to use it in your programs if you wish
>>> for it to get applied.
>>> For example, for MapReduce outputs to be compressed, you may run an MR
>>> job with the following option set on its configuration:
>>> "-Dmapred.output.compression.codec=org.apache.hadoop.io.compress.crypto.CyptoCodec"
>>> And then you can notice that your output files were all properly
>>> encrypted with the above codec.
>>> Likewise, if you're using direct HDFS writes, you will need to wrap
>>> your outputstream with this codec. Look at the CompressionCodec API to
>>> see how:
>>> http://hadoop.apache.org/common/docs/stable/api/org/apache/hadoop/io/compress/CompressionCodec.html#createOutputStream(java.io.OutputStream)
>>> (Where your CompressionCodec must be the
>>> org.apache.hadoop.io.compress.crypto.CyptoCodec instance).
>>> On Tue, Aug 7, 2012 at 1:11 PM, Farrokh Shahriari
>>> <mohandes.zebeleh.67@gmail.com> wrote:
>>>> Hello
>>>> I use "Hadoop Crypto Compressor" from this
>>>> site"https://github.com/geisbruch/HadoopCryptoCompressor" for encryption
>>>> hdfs files.
>>>> I've downloaded the complete code & create the jar file,Change the
>>>> propertise in core-site.xml as the site says.
>>>> But when I add a new file,nothing has happened & encryption isn't
>>>> working.
>>>> What can I do for encryption hdfs files ? Does anyone know how I should
>>>> use this class ?
>>>> Tnx
>>> --
>>> Harsh J
> -- 
> Harsh J

View raw message