Mailing-List: contact hdfs-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-dev@hadoop.apache.org
From: "Zheng, Kai" <kai.zheng@intel.com>
To: "hdfs-dev@hadoop.apache.org" <hdfs-dev@hadoop.apache.org>
Subject: RE: Hadoop encryption module as Apache Chimera incubator project
Thread-Topic: Hadoop encryption module as Apache Chimera incubator project
Thread-Index: 
 AQHRUo2FRYvQ78cdYEW15gf7dom5bJ8Emw+AgAAu+oCAAMKRgIAAWdWAgABItYCAAI2kQA==
Date: Fri, 22 Jan 2016 01:11:01 +0000
Message-ID: 
 <8D5F7E3237B3ED47B84CF187BB17B66614875B9C@SHSMSX103.ccr.corp.intel.com>
References: <D2C329DA.A315%uma.gangumalla@intel.com>
 <CAGB5D2av2Y6-S1Mk2iXhv4GPtUae9-H8DyhkJFAx19CpZdcv9A@mail.gmail.com>
 <D2C58B12.A5C9%uma.gangumalla@intel.com>
 <8D5F7E3237B3ED47B84CF187BB17B66614875052@SHSMSX103.ccr.corp.intel.com>
 <D2C679F2.3892A%cnauroth@hortonworks.com>
 <D2C6B87F.A6DD%uma.gangumalla@intel.com>
In-Reply-To: <D2C6B87F.A6DD%uma.gangumalla@intel.com>
Accept-Language: en-US
Content-Language: en-US
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

Thanks Chris for the pointer and Uma for the confirm!

I'm happy to know HADOOP-11127 and there are already so many solid discussi=
ons in it. I will go through it, make my investigation and see how I can he=
lp in the effort.

Sure let's go back to Chimera and sorry fo the interrupt.

Regards,
Kai

-----Original Message-----
From: Gangumalla, Uma [mailto:uma.gangumalla@intel.com]=20
Sent: Friday, January 22, 2016 8:38 AM
To: hdfs-dev@hadoop.apache.org
Subject: Re: Hadoop encryption module as Apache Chimera incubator project

>Uma and everyone, thank you for the proposal.  +1 to proceed.
Thanks Chris for your feedback.

Kai Wrote:
I believe Haifeng had mentioned the problem in a call when discussing erasu=
re coding work, but until now I got to understand what's the problem and ho=
w Chimera or Snappy Java solved it. It looks like there can be some thin cl=
ients that don't rely on Hadoop installation so no libhadoop.so is availabl=
e to use on the client host. The approach mentioned here is to bundle the l=
ibrary file (*.so) into a jar and dynamically extract the file when loading=
 it. When no library file is contained in the jar then it goes to the norma=
l case, loading it from an installation. It's smart and nice! My question i=
s, could we consider to adopt the approach for libhadoop.so library? It mig=
ht be worth to discuss because, we're bundling more and more things into th=
e library (recently we just put Intel ISA-L support into it), and such thin=
gs may be desired for such clients. It may also be helpful for development,=
 because sometimes when run unit tests that involve native codes, some erro=
r may happen and complain no place to find libhadoop.so. Thanks.
[UMA] Good points Kai. It is good to think and invest some efforts to solve=
 libhadoop.so part.
 As Chris suggested taking this discussion into that JIRA HADOOP-11127 is m=
ore appropriate thing to do.


Regards,
Uma


On 1/21/16, 12:18 PM, "Chris Nauroth" <cnauroth@hortonworks.com> wrote:

>> My question is, could we consider to adopt the approach for=20
>>libhadoop.so library?
>
>
>This is something that I have proposed already in HADOOP-11127.  There=20
>is not consensus on proceeding with it from the contributors in that=20
>discussion.  There are some big challenges around how it would impact=20
>the release process.  I also have not had availability to prototype an=20
>implementation to make a stronger case for feasibility.  Kai, if this=20
>is something that you're interested in, then I encourage you to join=20
>the discussion in HADOOP-11127 or even pick up prototyping work if you'd l=
ike.
> Since we have that existing JIRA, let's keep this mail thread focused=20
>just on Chimera.  Thank you!
>
>Uma and everyone, thank you for the proposal.  +1 to proceed.
>
>--Chris Nauroth
>
>
>
>
>On 1/20/16, 11:16 PM, "Zheng, Kai" <kai.zheng@intel.com> wrote:
>
>>Thanks Uma.=20
>>
>>I have a question by the way, it's not about Chimera project, but=20
>>about the mentioned advantage 1 and libhadoop.so installation problem.=20
>>I copied the saying as below for convenience.
>>
>>>>1. As Chimera embedded the native in jar (similar to Snappy java),=20
>>>>it solves the current issues in Hadoop that a HDFS client has to=20
>>>>depend libhadoop.so if the client needs to read encryption zone in=20
>>>>HDFS. This means a HDFS client may has to depend a Hadoop=20
>>>>installation in local machine. For example, HBase uses depends on=20
>>>>HDFS client jar other than a Hadoop installation and then has no=20
>>>>access to libhadoop.so. So HBase cannot use an encryption zone or it ca=
use error.
>>
>>I believe Haifeng had mentioned the problem in a call when discussing=20
>>erasure coding work, but until now I got to understand what's the=20
>>problem and how Chimera or Snappy Java solved it. It looks like there=20
>>can be some thin clients that don't rely on Hadoop installation so no=20
>>libhadoop.so is available to use on the client host. The approach=20
>>mentioned here is to bundle the library file (*.so) into a jar and=20
>>dynamically extract the file when loading it. When no library file is=20
>>contained in the jar then it goes to the normal case, loading it from=20
>>an installation. It's smart and nice! My question is, could we=20
>>consider to adopt the approach for libhadoop.so library? It might be=20
>>worth to discuss because, we're bundling more and more things into the=20
>>library (recently we just put Intel ISA-L support into it), and such=20
>>things may be desired for such clients. It may also be helpful for=20
>>development, because sometimes when run unit tests that involve native=20
>>codes, some error may happen and complain no place to find libhadoop.so. =
Thanks.
>>
>>Regards,
>>Kai
>>
>>-----Original Message-----
>>From: Gangumalla, Uma [mailto:uma.gangumalla@intel.com]
>>Sent: Thursday, January 21, 2016 11:20 AM
>>To: hdfs-dev@hadoop.apache.org
>>Subject: Re: Hadoop encryption module as Apache Chimera incubator=20
>>project
>>
>>Hi All,
>>Thanks Andrew, ATM, Yi, Kai, Larry. Thanks Haifeng on clarifying=20
>>release stuff.
>>
>>Please find my responses below.
>>
>>Andrew wrote:
>>If it becomes part of Apache Commons, could we make Chimera a separate=20
>>JAR? We have real difficulties bumping dependency versions right now,=20
>>so ideally we don't need to bump our existing Commons dependencies to=20
>>use Chimera.
>>[UMA] Yes, We plan to make separate Jar.
>>
>>Andrew wrote:
>>With this refactoring, do we have confidence that we can get our=20
>>desired changes merged and released in a timely fashion? e.g. if we=20
>>find another bug like HADOOP-11343, we'll first need to get the fix=20
>>into Chimera, have a new Chimera release, then bump Hadoop's Chimera=20
>>dependency. This also relates to the previous point, it's easier to do=20
>>this dependency bump if Chimera is a separate JAR.
>>[UMA] Yes and the main target users for this project is Hadoop and=20
>>Spark right now.
>>So, Hadoop requirements would be the priority tasks for it.
>>
>>
>>ATM wrote:
>>Uma, would you be up for approaching the Apache Commons folks saying=20
>>that you'd like to contribute Chimera? I'd recommend saying that=20
>>Hadoop and Spark are both on board to depend on this.
>>[UMA] Yes, will do that.
>>
>>
>>Kai wrote:
>>Just a question. Becoming a separate jar/module in Apache Commons=20
>>means Chimera or the module can be released separately or in a timely=20
>>manner, not coupling with other modules for release in the project? Thank=
s.
>>
>>[Haifeng] From apache commons project web=20
>>(https://commons.apache.org/), we see there is already a long list of=20
>>components in its Apache Commons Proper list. Each component has its=20
>>own release version and date. To join and be one of the list is the targe=
t.
>>
>>Larry wrote:
>>If what we are looking for is some level of autonomy then it would=20
>>need to be a module with its own release train - or at least be able to.
>>
>>[UMA] Yes. Agree
>>
>>Kai wrote:
>>So far I saw it's mainly about AES-256. I suggest the scope can be=20
>>expanded a little bit, perhaps a dedicated high performance encryption=20
>>library, then we would have quite much to contribute to it, like other=20
>>ciphers, MACs, PRNGs and so on. Then both Hadoop and Spark can benefit=20
>>from it.
>>
>>[UMA] Yes, once development started as separate project then its free=20
>>to evolve and provide more improvements to support more customer/user=20
>>space for encryption based on demand.
>>Haifeng, would you add some points here?
>>
>>
>>Regards,
>>Uma
>>
>>On 1/20/16, 4:31 PM, "Andrew Wang" <andrew.wang@cloudera.com> wrote:
>>
>>>Thanks Uma for putting together this proposal. Overall sounds good to=20
>>>me,
>>>+1 for these improvements. A few comments/questions:
>>>
>>>* If it becomes part of Apache Commons, could we make Chimera a=20
>>>separate JAR? We have real difficulties bumping dependency versions=20
>>>right now, so ideally we don't need to bump our existing Commons=20
>>>dependencies to use Chimera.
>>>* With this refactoring, do we have confidence that we can get our=20
>>>desired changes merged and released in a timely fashion? e.g. if we=20
>>>find another bug like HADOOP-11343, we'll first need to get the fix=20
>>>into Chimera, have a new Chimera release, then bump Hadoop's Chimera=20
>>>dependency. This also relates to the previous point, it's easier to=20
>>>do this dependency bump if Chimera is a separate JAR.
>>>
>>>Best,
>>>Andrew
>>>
>>>On Mon, Jan 18, 2016 at 11:46 PM, Gangumalla, Uma=20
>>><uma.gangumalla@intel.com>
>>>wrote:
>>>
>>>> Hi Devs,
>>>>
>>>>   Some of our Hadoop developers working with Spark community to=20
>>>>implement  the shuffle encryption. While implementing that, they=20
>>>>realized some/most of  the code in Hadoop encryption code and their=20
>>>>implemention code have to be  duplicated. This leads to an idea to=20
>>>>create separate library, named it as  Chimera=20
>>>>(https://github.com/intel-hadoop/chimera). It is an optimized=20
>>>>cryptographic library. It provides Java API for both cipher level=20
>>>>and Java  stream level to help developers implement high performance=20
>>>>AES encryption/decryption with the minimum code and effort. Chimera=20
>>>>was originally based Hadoop crypto code but was improved and=20
>>>>generalized a lot  for supporting wider scope of data encryption=20
>>>>needs for more components in  the community.
>>>>
>>>> So, now team is thinking to make this library code as open source=20
>>>>project  via Apache Incubation.  Proposal is Chimera to join the=20
>>>>Apache as  incubating or Apache commons for facilitating its adoption.
>>>>
>>>> In general this will get the following advantages:
>>>> 1. As Chimera embedded the native in jar (similar to Snappy java),=20
>>>>it solves the current issues in Hadoop that a HDFS client has to=20
>>>>depend libhadoop.so if the client needs to read encryption zone in=20
>>>>HDFS. This means a HDFS client may has to depend a Hadoop=20
>>>>installation in local machine. For example, HBase uses depends on=20
>>>>HDFS client jar other than a  Hadoop installation and then has no=20
>>>>access to libhadoop.so. So HBase cannot  use an encryption zone or it c=
ause error.
>>>> 2. Apache Spark shuffle and spill encryption could be another=20
>>>>example where we can use Chimera. We see the fact that the stream=20
>>>>encryption for  Spark shuffle and spill doesn=B9t require a stream=20
>>>>cipher like AES/CTR,  although the code shares the common=20
>>>>characteristics of a stream style API.
>>>> We also see the need of optimized Cipher for non-stream style use=20
>>>>cases  such as network encryption such as RPC. These improvements=20
>>>>actually can be  shared by more projects of need.
>>>>
>>>> 3. Simplified code in Hadoop to use dedicated library. And drives=20
>>>> more improvements. For example, current the Hadoop crypto code API=20
>>>> is totally based on AES/CTR although it has cipher suite configuration=
s.
>>>>
>>>> AES/CTR is for HDFS data encryption at rest, but it doesn=B9t=20
>>>> necessary to be AES/CTR for all the cases such as Data transfer=20
>>>> encryption and intermediate file encryption.
>>>>
>>>>
>>>>
>>>>  So, we wanted to check with Hadoop community about this proposal.
>>>>Please
>>>> provide your feedbacks on it.
>>>>
>>>> Regards,
>>>> Uma
>>>>
>>
>>
>