Return-Path: X-Original-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D039E18972 for ; Fri, 22 Jan 2016 01:11:42 +0000 (UTC) Received: (qmail 48269 invoked by uid 500); 22 Jan 2016 01:11:41 -0000 Delivered-To: apmail-hadoop-hdfs-dev-archive@hadoop.apache.org Received: (qmail 48171 invoked by uid 500); 22 Jan 2016 01:11:41 -0000 Mailing-List: contact hdfs-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-dev@hadoop.apache.org Delivered-To: mailing list hdfs-dev@hadoop.apache.org Received: (qmail 48154 invoked by uid 99); 22 Jan 2016 01:11:41 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Jan 2016 01:11:41 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id DF9141A00A1 for ; Fri, 22 Jan 2016 01:11:40 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.553 X-Spam-Level: X-Spam-Status: No, score=-0.553 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.554, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id mkM8aQdw-LIf for ; Fri, 22 Jan 2016 01:11:28 +0000 (UTC) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTP id 9CEC7439A7 for ; Fri, 22 Jan 2016 01:11:27 +0000 (UTC) Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga101.jf.intel.com with ESMTP; 21 Jan 2016 17:11:04 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.22,328,1449561600"; d="scan'208";a="898510928" Received: from fmsmsx103.amr.corp.intel.com ([10.18.124.201]) by fmsmga002.fm.intel.com with ESMTP; 21 Jan 2016 17:11:04 -0800 Received: from fmsmsx156.amr.corp.intel.com (10.18.116.74) by FMSMSX103.amr.corp.intel.com (10.18.124.201) with Microsoft SMTP Server (TLS) id 14.3.248.2; Thu, 21 Jan 2016 17:11:03 -0800 Received: from shsmsx102.ccr.corp.intel.com (10.239.4.154) by fmsmsx156.amr.corp.intel.com (10.18.116.74) with Microsoft SMTP Server (TLS) id 14.3.248.2; Thu, 21 Jan 2016 17:11:03 -0800 Received: from shsmsx103.ccr.corp.intel.com ([169.254.4.218]) by shsmsx102.ccr.corp.intel.com ([169.254.2.172]) with mapi id 14.03.0248.002; Fri, 22 Jan 2016 09:11:01 +0800 From: "Zheng, Kai" To: "hdfs-dev@hadoop.apache.org" Subject: RE: Hadoop encryption module as Apache Chimera incubator project Thread-Topic: Hadoop encryption module as Apache Chimera incubator project Thread-Index: AQHRUo2FRYvQ78cdYEW15gf7dom5bJ8Emw+AgAAu+oCAAMKRgIAAWdWAgABItYCAAI2kQA== Date: Fri, 22 Jan 2016 01:11:01 +0000 Message-ID: <8D5F7E3237B3ED47B84CF187BB17B66614875B9C@SHSMSX103.ccr.corp.intel.com> References: <8D5F7E3237B3ED47B84CF187BB17B66614875052@SHSMSX103.ccr.corp.intel.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiNDc2NmQ3MjgtNTQ1My00YzhlLTkwMDAtZGMwM2I3YjI4YzJlIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX1BVQkxJQyJ9XX1dfSwiU3ViamVjdExhYmVscyI6W10sIlRNQ1ZlcnNpb24iOiIxNS40LjEwLjE5IiwiVHJ1c3RlZExhYmVsSGFzaCI6IllFQkZNdHByK1JjMkVcL0RQUU84Y3pNNHZNaFhGb05CaGdiOThNYXVqenIwPSJ9 x-ctpclassification: CTP_PUBLIC x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Thanks Chris for the pointer and Uma for the confirm! I'm happy to know HADOOP-11127 and there are already so many solid discussi= ons in it. I will go through it, make my investigation and see how I can he= lp in the effort. Sure let's go back to Chimera and sorry fo the interrupt. Regards, Kai -----Original Message----- From: Gangumalla, Uma [mailto:uma.gangumalla@intel.com]=20 Sent: Friday, January 22, 2016 8:38 AM To: hdfs-dev@hadoop.apache.org Subject: Re: Hadoop encryption module as Apache Chimera incubator project >Uma and everyone, thank you for the proposal. +1 to proceed. Thanks Chris for your feedback. Kai Wrote: I believe Haifeng had mentioned the problem in a call when discussing erasu= re coding work, but until now I got to understand what's the problem and ho= w Chimera or Snappy Java solved it. It looks like there can be some thin cl= ients that don't rely on Hadoop installation so no libhadoop.so is availabl= e to use on the client host. The approach mentioned here is to bundle the l= ibrary file (*.so) into a jar and dynamically extract the file when loading= it. When no library file is contained in the jar then it goes to the norma= l case, loading it from an installation. It's smart and nice! My question i= s, could we consider to adopt the approach for libhadoop.so library? It mig= ht be worth to discuss because, we're bundling more and more things into th= e library (recently we just put Intel ISA-L support into it), and such thin= gs may be desired for such clients. It may also be helpful for development,= because sometimes when run unit tests that involve native codes, some erro= r may happen and complain no place to find libhadoop.so. Thanks. [UMA] Good points Kai. It is good to think and invest some efforts to solve= libhadoop.so part. As Chris suggested taking this discussion into that JIRA HADOOP-11127 is m= ore appropriate thing to do. Regards, Uma On 1/21/16, 12:18 PM, "Chris Nauroth" wrote: >> My question is, could we consider to adopt the approach for=20 >>libhadoop.so library? > > >This is something that I have proposed already in HADOOP-11127. There=20 >is not consensus on proceeding with it from the contributors in that=20 >discussion. There are some big challenges around how it would impact=20 >the release process. I also have not had availability to prototype an=20 >implementation to make a stronger case for feasibility. Kai, if this=20 >is something that you're interested in, then I encourage you to join=20 >the discussion in HADOOP-11127 or even pick up prototyping work if you'd l= ike. > Since we have that existing JIRA, let's keep this mail thread focused=20 >just on Chimera. Thank you! > >Uma and everyone, thank you for the proposal. +1 to proceed. > >--Chris Nauroth > > > > >On 1/20/16, 11:16 PM, "Zheng, Kai" wrote: > >>Thanks Uma.=20 >> >>I have a question by the way, it's not about Chimera project, but=20 >>about the mentioned advantage 1 and libhadoop.so installation problem.=20 >>I copied the saying as below for convenience. >> >>>>1. As Chimera embedded the native in jar (similar to Snappy java),=20 >>>>it solves the current issues in Hadoop that a HDFS client has to=20 >>>>depend libhadoop.so if the client needs to read encryption zone in=20 >>>>HDFS. This means a HDFS client may has to depend a Hadoop=20 >>>>installation in local machine. For example, HBase uses depends on=20 >>>>HDFS client jar other than a Hadoop installation and then has no=20 >>>>access to libhadoop.so. So HBase cannot use an encryption zone or it ca= use error. >> >>I believe Haifeng had mentioned the problem in a call when discussing=20 >>erasure coding work, but until now I got to understand what's the=20 >>problem and how Chimera or Snappy Java solved it. It looks like there=20 >>can be some thin clients that don't rely on Hadoop installation so no=20 >>libhadoop.so is available to use on the client host. The approach=20 >>mentioned here is to bundle the library file (*.so) into a jar and=20 >>dynamically extract the file when loading it. When no library file is=20 >>contained in the jar then it goes to the normal case, loading it from=20 >>an installation. It's smart and nice! My question is, could we=20 >>consider to adopt the approach for libhadoop.so library? It might be=20 >>worth to discuss because, we're bundling more and more things into the=20 >>library (recently we just put Intel ISA-L support into it), and such=20 >>things may be desired for such clients. It may also be helpful for=20 >>development, because sometimes when run unit tests that involve native=20 >>codes, some error may happen and complain no place to find libhadoop.so. = Thanks. >> >>Regards, >>Kai >> >>-----Original Message----- >>From: Gangumalla, Uma [mailto:uma.gangumalla@intel.com] >>Sent: Thursday, January 21, 2016 11:20 AM >>To: hdfs-dev@hadoop.apache.org >>Subject: Re: Hadoop encryption module as Apache Chimera incubator=20 >>project >> >>Hi All, >>Thanks Andrew, ATM, Yi, Kai, Larry. Thanks Haifeng on clarifying=20 >>release stuff. >> >>Please find my responses below. >> >>Andrew wrote: >>If it becomes part of Apache Commons, could we make Chimera a separate=20 >>JAR? We have real difficulties bumping dependency versions right now,=20 >>so ideally we don't need to bump our existing Commons dependencies to=20 >>use Chimera. >>[UMA] Yes, We plan to make separate Jar. >> >>Andrew wrote: >>With this refactoring, do we have confidence that we can get our=20 >>desired changes merged and released in a timely fashion? e.g. if we=20 >>find another bug like HADOOP-11343, we'll first need to get the fix=20 >>into Chimera, have a new Chimera release, then bump Hadoop's Chimera=20 >>dependency. This also relates to the previous point, it's easier to do=20 >>this dependency bump if Chimera is a separate JAR. >>[UMA] Yes and the main target users for this project is Hadoop and=20 >>Spark right now. >>So, Hadoop requirements would be the priority tasks for it. >> >> >>ATM wrote: >>Uma, would you be up for approaching the Apache Commons folks saying=20 >>that you'd like to contribute Chimera? I'd recommend saying that=20 >>Hadoop and Spark are both on board to depend on this. >>[UMA] Yes, will do that. >> >> >>Kai wrote: >>Just a question. Becoming a separate jar/module in Apache Commons=20 >>means Chimera or the module can be released separately or in a timely=20 >>manner, not coupling with other modules for release in the project? Thank= s. >> >>[Haifeng] From apache commons project web=20 >>(https://commons.apache.org/), we see there is already a long list of=20 >>components in its Apache Commons Proper list. Each component has its=20 >>own release version and date. To join and be one of the list is the targe= t. >> >>Larry wrote: >>If what we are looking for is some level of autonomy then it would=20 >>need to be a module with its own release train - or at least be able to. >> >>[UMA] Yes. Agree >> >>Kai wrote: >>So far I saw it's mainly about AES-256. I suggest the scope can be=20 >>expanded a little bit, perhaps a dedicated high performance encryption=20 >>library, then we would have quite much to contribute to it, like other=20 >>ciphers, MACs, PRNGs and so on. Then both Hadoop and Spark can benefit=20 >>from it. >> >>[UMA] Yes, once development started as separate project then its free=20 >>to evolve and provide more improvements to support more customer/user=20 >>space for encryption based on demand. >>Haifeng, would you add some points here? >> >> >>Regards, >>Uma >> >>On 1/20/16, 4:31 PM, "Andrew Wang" wrote: >> >>>Thanks Uma for putting together this proposal. Overall sounds good to=20 >>>me, >>>+1 for these improvements. A few comments/questions: >>> >>>* If it becomes part of Apache Commons, could we make Chimera a=20 >>>separate JAR? We have real difficulties bumping dependency versions=20 >>>right now, so ideally we don't need to bump our existing Commons=20 >>>dependencies to use Chimera. >>>* With this refactoring, do we have confidence that we can get our=20 >>>desired changes merged and released in a timely fashion? e.g. if we=20 >>>find another bug like HADOOP-11343, we'll first need to get the fix=20 >>>into Chimera, have a new Chimera release, then bump Hadoop's Chimera=20 >>>dependency. This also relates to the previous point, it's easier to=20 >>>do this dependency bump if Chimera is a separate JAR. >>> >>>Best, >>>Andrew >>> >>>On Mon, Jan 18, 2016 at 11:46 PM, Gangumalla, Uma=20 >>> >>>wrote: >>> >>>> Hi Devs, >>>> >>>> Some of our Hadoop developers working with Spark community to=20 >>>>implement the shuffle encryption. While implementing that, they=20 >>>>realized some/most of the code in Hadoop encryption code and their=20 >>>>implemention code have to be duplicated. This leads to an idea to=20 >>>>create separate library, named it as Chimera=20 >>>>(https://github.com/intel-hadoop/chimera). It is an optimized=20 >>>>cryptographic library. It provides Java API for both cipher level=20 >>>>and Java stream level to help developers implement high performance=20 >>>>AES encryption/decryption with the minimum code and effort. Chimera=20 >>>>was originally based Hadoop crypto code but was improved and=20 >>>>generalized a lot for supporting wider scope of data encryption=20 >>>>needs for more components in the community. >>>> >>>> So, now team is thinking to make this library code as open source=20 >>>>project via Apache Incubation. Proposal is Chimera to join the=20 >>>>Apache as incubating or Apache commons for facilitating its adoption. >>>> >>>> In general this will get the following advantages: >>>> 1. As Chimera embedded the native in jar (similar to Snappy java),=20 >>>>it solves the current issues in Hadoop that a HDFS client has to=20 >>>>depend libhadoop.so if the client needs to read encryption zone in=20 >>>>HDFS. This means a HDFS client may has to depend a Hadoop=20 >>>>installation in local machine. For example, HBase uses depends on=20 >>>>HDFS client jar other than a Hadoop installation and then has no=20 >>>>access to libhadoop.so. So HBase cannot use an encryption zone or it c= ause error. >>>> 2. Apache Spark shuffle and spill encryption could be another=20 >>>>example where we can use Chimera. We see the fact that the stream=20 >>>>encryption for Spark shuffle and spill doesn=B9t require a stream=20 >>>>cipher like AES/CTR, although the code shares the common=20 >>>>characteristics of a stream style API. >>>> We also see the need of optimized Cipher for non-stream style use=20 >>>>cases such as network encryption such as RPC. These improvements=20 >>>>actually can be shared by more projects of need. >>>> >>>> 3. Simplified code in Hadoop to use dedicated library. And drives=20 >>>> more improvements. For example, current the Hadoop crypto code API=20 >>>> is totally based on AES/CTR although it has cipher suite configuration= s. >>>> >>>> AES/CTR is for HDFS data encryption at rest, but it doesn=B9t=20 >>>> necessary to be AES/CTR for all the cases such as Data transfer=20 >>>> encryption and intermediate file encryption. >>>> >>>> >>>> >>>> So, we wanted to check with Hadoop community about this proposal. >>>>Please >>>> provide your feedbacks on it. >>>> >>>> Regards, >>>> Uma >>>> >> >> >