hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gangumalla, Uma" <uma.ganguma...@intel.com>
Subject Hadoop encryption module as Apache Chimera incubator project
Date Tue, 19 Jan 2016 07:46:36 GMT
Hi Devs,

  Some of our Hadoop developers working with Spark community to implement the shuffle encryption.
While implementing that, they realized some/most of the code in Hadoop encryption code and
their  implemention code have to be duplicated. This leads to an idea to create separate library,
named it as Chimera (https://github.com/intel-hadoop/chimera). It is an optimized cryptographic
library. It provides Java API for both cipher level and Java stream level to help developers
implement high performance AES encryption/decryption with the minimum code and effort. Chimera
was originally based Hadoop crypto code but was improved and generalized a lot for supporting
wider scope of data encryption needs for more components in the community.

So, now team is thinking to make this library code as open source project via Apache Incubation.
 Proposal is Chimera to join the Apache as incubating or Apache commons for facilitating its

In general this will get the following advantages:
1. As Chimera embedded the native in jar (similar to Snappy java), it solves the current issues
in Hadoop that a HDFS client has to depend libhadoop.so if the client needs to read encryption
zone in HDFS. This means a HDFS client may has to depend a Hadoop installation in local machine.
For example, HBase uses depends on HDFS client jar other than a Hadoop installation and then
has no access to libhadoop.so. So HBase cannot use an encryption zone or it cause error.
2. Apache Spark shuffle and spill encryption could be another example where we can use Chimera.
We see the fact that the stream encryption for Spark shuffle and spill doesn’t require a
stream cipher like AES/CTR, although the code shares the common characteristics of a stream
style API. We also see the need of optimized Cipher for non-stream style use cases such as
network encryption such as RPC. These improvements actually can be shared by more projects
of need.

3. Simplified code in Hadoop to use dedicated library. And drives more improvements. For example,
current the Hadoop crypto code API is totally based on AES/CTR although it has cipher suite

AES/CTR is for HDFS data encryption at rest, but it doesn’t necessary to be AES/CTR for
all the cases such as Data transfer encryption and intermediate file encryption.

 So, we wanted to check with Hadoop community about this proposal. Please provide your feedbacks
on it.


View raw message