hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Larry McCay <lmc...@hortonworks.com>
Subject Re: Hadoop encryption module as Apache Chimera incubator project
Date Thu, 21 Jan 2016 02:43:19 GMT
That’s a good point, Kai.

If what we are looking for is some level of autonomy then it would need to be a module with
its own release train - or at least be able to.

On Jan 20, 2016, at 9:18 PM, Zheng, Kai <kai.zheng@intel.com> wrote:

> Just a question. Becoming a separate jar/module in Apache Commons means Chimera or the
module can be released separately or in a timely manner, not coupling with other modules for
release in the project? Thanks.
> Regards,
> Kai
> -----Original Message-----
> From: Aaron T. Myers [mailto:atm@cloudera.com] 
> Sent: Thursday, January 21, 2016 9:44 AM
> To: hdfs-dev@hadoop.apache.org
> Subject: Re: Hadoop encryption module as Apache Chimera incubator project
> +1 for Hadoop depending upon Chimera, assuming Chimera can get
> hosted/released under some Apache project umbrella. If that's Apache Commons (which makes
a lot of sense to me) then I'm also a big +1 on Andrew's suggestion that we make it a separate
> Uma, would you be up for approaching the Apache Commons folks saying that you'd like
to contribute Chimera? I'd recommend saying that Hadoop and Spark are both on board to depend
on this.
> --
> Aaron T. Myers
> Software Engineer, Cloudera
> On Wed, Jan 20, 2016 at 4:31 PM, Andrew Wang <andrew.wang@cloudera.com>
> wrote:
>> Thanks Uma for putting together this proposal. Overall sounds good to 
>> me,
>> +1 for these improvements. A few comments/questions:
>> * If it becomes part of Apache Commons, could we make Chimera a 
>> separate JAR? We have real difficulties bumping dependency versions 
>> right now, so ideally we don't need to bump our existing Commons 
>> dependencies to use Chimera.
>> * With this refactoring, do we have confidence that we can get our 
>> desired changes merged and released in a timely fashion? e.g. if we 
>> find another bug like HADOOP-11343, we'll first need to get the fix 
>> into Chimera, have a new Chimera release, then bump Hadoop's Chimera 
>> dependency. This also relates to the previous point, it's easier to do 
>> this dependency bump if Chimera is a separate JAR.
>> Best,
>> Andrew
>> On Mon, Jan 18, 2016 at 11:46 PM, Gangumalla, Uma < 
>> uma.gangumalla@intel.com>
>> wrote:
>>> Hi Devs,
>>>  Some of our Hadoop developers working with Spark community to 
>>> implement the shuffle encryption. While implementing that, they 
>>> realized some/most
>> of
>>> the code in Hadoop encryption code and their  implemention code have 
>>> to
>> be
>>> duplicated. This leads to an idea to create separate library, named 
>>> it as Chimera (https://github.com/intel-hadoop/chimera). It is an 
>>> optimized cryptographic library. It provides Java API for both 
>>> cipher level and
>> Java
>>> stream level to help developers implement high performance AES 
>>> encryption/decryption with the minimum code and effort. Chimera was 
>>> originally based Hadoop crypto code but was improved and generalized 
>>> a
>> lot
>>> for supporting wider scope of data encryption needs for more 
>>> components
>> in
>>> the community.
>>> So, now team is thinking to make this library code as open source 
>>> project via Apache Incubation.  Proposal is Chimera to join the 
>>> Apache as incubating or Apache commons for facilitating its adoption.
>>> In general this will get the following advantages:
>>> 1. As Chimera embedded the native in jar (similar to Snappy java), 
>>> it solves the current issues in Hadoop that a HDFS client has to 
>>> depend libhadoop.so if the client needs to read encryption zone in 
>>> HDFS. This means a HDFS client may has to depend a Hadoop 
>>> installation in local machine. For example, HBase uses depends on 
>>> HDFS client jar other than a Hadoop installation and then has no 
>>> access to libhadoop.so. So HBase
>> cannot
>>> use an encryption zone or it cause error.
>>> 2. Apache Spark shuffle and spill encryption could be another 
>>> example where we can use Chimera. We see the fact that the stream 
>>> encryption for Spark shuffle and spill doesn’t require a stream 
>>> cipher like AES/CTR, although the code shares the common 
>>> characteristics of a stream style
>> API.
>>> We also see the need of optimized Cipher for non-stream style use 
>>> cases such as network encryption such as RPC. These improvements 
>>> actually can
>> be
>>> shared by more projects of need.
>>> 3. Simplified code in Hadoop to use dedicated library. And drives 
>>> more improvements. For example, current the Hadoop crypto code API 
>>> is totally based on AES/CTR although it has cipher suite configurations.
>>> AES/CTR is for HDFS data encryption at rest, but it doesn’t 
>>> necessary to be AES/CTR for all the cases such as Data transfer 
>>> encryption and intermediate file encryption.
>>> So, we wanted to check with Hadoop community about this proposal. 
>>> Please provide your feedbacks on it.
>>> Regards,
>>> Uma

View raw message