Return-Path: X-Original-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D19B71851F for ; Thu, 21 Jan 2016 07:16:55 +0000 (UTC) Received: (qmail 56913 invoked by uid 500); 21 Jan 2016 07:16:55 -0000 Delivered-To: apmail-hadoop-hdfs-dev-archive@hadoop.apache.org Received: (qmail 56810 invoked by uid 500); 21 Jan 2016 07:16:55 -0000 Mailing-List: contact hdfs-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-dev@hadoop.apache.org Delivered-To: mailing list hdfs-dev@hadoop.apache.org Received: (qmail 56799 invoked by uid 99); 21 Jan 2016 07:16:54 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Jan 2016 07:16:54 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 23C13C0606 for ; Thu, 21 Jan 2016 07:16:54 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.553 X-Spam-Level: X-Spam-Status: No, score=-0.553 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.554, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id femO-bwLK5sI for ; Thu, 21 Jan 2016 07:16:41 +0000 (UTC) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTP id F0D2B21195 for ; Thu, 21 Jan 2016 07:16:40 +0000 (UTC) Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga104.fm.intel.com with ESMTP; 20 Jan 2016 23:16:33 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.22,324,1449561600"; d="scan'208";a="731543443" Received: from fmsmsx107.amr.corp.intel.com ([10.18.124.205]) by orsmga003.jf.intel.com with ESMTP; 20 Jan 2016 23:16:34 -0800 Received: from fmsmsx112.amr.corp.intel.com (10.18.116.6) by fmsmsx107.amr.corp.intel.com (10.18.124.205) with Microsoft SMTP Server (TLS) id 14.3.248.2; Wed, 20 Jan 2016 23:16:33 -0800 Received: from shsmsx102.ccr.corp.intel.com (10.239.4.154) by FMSMSX112.amr.corp.intel.com (10.18.116.6) with Microsoft SMTP Server (TLS) id 14.3.248.2; Wed, 20 Jan 2016 23:16:33 -0800 Received: from shsmsx103.ccr.corp.intel.com ([169.254.4.218]) by shsmsx102.ccr.corp.intel.com ([169.254.2.172]) with mapi id 14.03.0248.002; Thu, 21 Jan 2016 15:16:30 +0800 From: "Zheng, Kai" To: "hdfs-dev@hadoop.apache.org" Subject: RE: Hadoop encryption module as Apache Chimera incubator project Thread-Topic: Hadoop encryption module as Apache Chimera incubator project Thread-Index: AQHRUo2FRYvQ78cdYEW15gf7dom5bJ8Emw+AgAAu+oCAAMKRgA== Date: Thu, 21 Jan 2016 07:16:30 +0000 Message-ID: <8D5F7E3237B3ED47B84CF187BB17B66614875052@SHSMSX103.ccr.corp.intel.com> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiZTAxYzQxYzAtNGQ0ZC00YjM0LWI5ZDUtZWZhZTAyMWU0YmQ1IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX1BVQkxJQyJ9XX1dfSwiU3ViamVjdExhYmVscyI6W10sIlRNQ1ZlcnNpb24iOiIxNS40LjEwLjE5IiwiVHJ1c3RlZExhYmVsSGFzaCI6Ijh4TnN0bGZlUlNwT04rNGdMUnpDSHNjRE0remloU3UzU0NVWDh3WWlcL0Z3PSJ9 x-ctpclassification: CTP_PUBLIC x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Thanks Uma.=20 I have a question by the way, it's not about Chimera project, but about the= mentioned advantage 1 and libhadoop.so installation problem. I copied the = saying as below for convenience. >>1. As Chimera embedded the native in jar (similar to Snappy java), it sol= ves the current issues in Hadoop that a HDFS client has to depend libhadoop= .so if the client needs to read encryption zone in HDFS. This means a HDFS = client may has to depend a Hadoop installation in local machine. For exampl= e, HBase uses depends on HDFS client jar other than a Hadoop installation a= nd then has no access to libhadoop.so. So HBase cannot use an encryption zo= ne or it cause error. I believe Haifeng had mentioned the problem in a call when discussing erasu= re coding work, but until now I got to understand what's the problem and h= ow Chimera or Snappy Java solved it. It looks like there can be some thin c= lients that don't rely on Hadoop installation so no libhadoop.so is availab= le to use on the client host. The approach mentioned here is to bundle the = library file (*.so) into a jar and dynamically extract the file when loadin= g it. When no library file is contained in the jar then it goes to the norm= al case, loading it from an installation. It's smart and nice! My question = is, could we consider to adopt the approach for libhadoop.so library? It mi= ght be worth to discuss because, we're bundling more and more things into t= he library (recently we just put Intel ISA-L support into it), and such thi= ngs may be desired for such clients. It may also be helpful for development= , because sometimes when run unit tests that involve native codes, some err= or may happen and complain no place to find libhadoop.so. Thanks. Regards, Kai -----Original Message----- From: Gangumalla, Uma [mailto:uma.gangumalla@intel.com]=20 Sent: Thursday, January 21, 2016 11:20 AM To: hdfs-dev@hadoop.apache.org Subject: Re: Hadoop encryption module as Apache Chimera incubator project Hi All, Thanks Andrew, ATM, Yi, Kai, Larry. Thanks Haifeng on clarifying release st= uff. Please find my responses below. Andrew wrote: If it becomes part of Apache Commons, could we make Chimera a separate JAR?= We have real difficulties bumping dependency versions right now, so ideall= y we don't need to bump our existing Commons dependencies to use Chimera. [UMA] Yes, We plan to make separate Jar. Andrew wrote: With this refactoring, do we have confidence that we can get our desired ch= anges merged and released in a timely fashion? e.g. if we find another bug = like HADOOP-11343, we'll first need to get the fix into Chimera, have a new= Chimera release, then bump Hadoop's Chimera dependency. This also relates = to the previous point, it's easier to do this dependency bump if Chimera is= a separate JAR. [UMA] Yes and the main target users for this project is Hadoop and Spark ri= ght now.=20 So, Hadoop requirements would be the priority tasks for it. ATM wrote: Uma, would you be up for approaching the Apache Commons folks saying that y= ou'd like to contribute Chimera? I'd recommend saying that Hadoop and Spark= are both on board to depend on this. [UMA] Yes, will do that. Kai wrote: Just a question. Becoming a separate jar/module in Apache Commons means Chi= mera or the module can be released separately or in a timely manner, not co= upling with other modules for release in the project? Thanks. [Haifeng] From apache commons project web (https://commons.apache.org/), we= see there is already a long list of components in its Apache Commons Prope= r list. Each component has its own release version and date. To join and be= one of the list is the target. Larry wrote: If what we are looking for is some level of autonomy then it would need to = be a module with its own release train - or at least be able to. [UMA] Yes. Agree Kai wrote: So far I saw it's mainly about AES-256. I suggest the scope can be expanded= a little bit, perhaps a dedicated high performance encryption library, the= n we would have quite much to contribute to it, like other ciphers, MACs, P= RNGs and so on. Then both Hadoop and Spark can benefit from it. [UMA] Yes, once development started as separate project then its free to ev= olve and provide more improvements to support more customer/user space for = encryption based on demand. Haifeng, would you add some points here? Regards, Uma On 1/20/16, 4:31 PM, "Andrew Wang" wrote: >Thanks Uma for putting together this proposal. Overall sounds good to=20 >me, >+1 for these improvements. A few comments/questions: > >* If it becomes part of Apache Commons, could we make Chimera a=20 >separate JAR? We have real difficulties bumping dependency versions=20 >right now, so ideally we don't need to bump our existing Commons=20 >dependencies to use Chimera. >* With this refactoring, do we have confidence that we can get our=20 >desired changes merged and released in a timely fashion? e.g. if we=20 >find another bug like HADOOP-11343, we'll first need to get the fix=20 >into Chimera, have a new Chimera release, then bump Hadoop's Chimera=20 >dependency. This also relates to the previous point, it's easier to do=20 >this dependency bump if Chimera is a separate JAR. > >Best, >Andrew > >On Mon, Jan 18, 2016 at 11:46 PM, Gangumalla, Uma=20 > >wrote: > >> Hi Devs, >> >> Some of our Hadoop developers working with Spark community to=20 >>implement the shuffle encryption. While implementing that, they=20 >>realized some/most of the code in Hadoop encryption code and their =20 >>implemention code have to be duplicated. This leads to an idea to=20 >>create separate library, named it as Chimera=20 >>(https://github.com/intel-hadoop/chimera). It is an optimized =20 >>cryptographic library. It provides Java API for both cipher level and=20 >>Java stream level to help developers implement high performance AES =20 >>encryption/decryption with the minimum code and effort. Chimera was =20 >>originally based Hadoop crypto code but was improved and generalized a=20 >>lot for supporting wider scope of data encryption needs for more=20 >>components in the community. >> >> So, now team is thinking to make this library code as open source=20 >>project via Apache Incubation. Proposal is Chimera to join the=20 >>Apache as incubating or Apache commons for facilitating its adoption. >> >> In general this will get the following advantages: >> 1. As Chimera embedded the native in jar (similar to Snappy java), it =20 >>solves the current issues in Hadoop that a HDFS client has to depend =20 >>libhadoop.so if the client needs to read encryption zone in HDFS. This =20 >>means a HDFS client may has to depend a Hadoop installation in local =20 >>machine. For example, HBase uses depends on HDFS client jar other than=20 >>a Hadoop installation and then has no access to libhadoop.so. So=20 >>HBase cannot use an encryption zone or it cause error. >> 2. Apache Spark shuffle and spill encryption could be another example =20 >>where we can use Chimera. We see the fact that the stream encryption=20 >>for Spark shuffle and spill doesn=B9t require a stream cipher like=20 >>AES/CTR, although the code shares the common characteristics of a=20 >>stream style API. >> We also see the need of optimized Cipher for non-stream style use=20 >>cases such as network encryption such as RPC. These improvements=20 >>actually can be shared by more projects of need. >> >> 3. Simplified code in Hadoop to use dedicated library. And drives=20 >> more improvements. For example, current the Hadoop crypto code API is=20 >> totally based on AES/CTR although it has cipher suite configurations. >> >> AES/CTR is for HDFS data encryption at rest, but it doesn=B9t necessary= =20 >> to be AES/CTR for all the cases such as Data transfer encryption and=20 >> intermediate file encryption. >> >> >> >> So, we wanted to check with Hadoop community about this proposal. >>Please >> provide your feedbacks on it. >> >> Regards, >> Uma >>