Return-Path: X-Original-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8B19318D2B for ; Thu, 21 Jan 2016 02:43:36 +0000 (UTC) Received: (qmail 50401 invoked by uid 500); 21 Jan 2016 02:43:35 -0000 Delivered-To: apmail-hadoop-hdfs-dev-archive@hadoop.apache.org Received: (qmail 50298 invoked by uid 500); 21 Jan 2016 02:43:35 -0000 Mailing-List: contact hdfs-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-dev@hadoop.apache.org Delivered-To: mailing list hdfs-dev@hadoop.apache.org Received: (qmail 50286 invoked by uid 99); 21 Jan 2016 02:43:35 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Jan 2016 02:43:35 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 15FB218056D for ; Thu, 21 Jan 2016 02:43:35 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.874 X-Spam-Level: X-Spam-Status: No, score=0.874 tagged_above=-999 required=6.31 tests=[FSL_HELO_BARE_IP_2=0.873, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id sxRNRZAICJnN for ; Thu, 21 Jan 2016 02:43:23 +0000 (UTC) Received: from relayvx12c.securemail.intermedia.net (relayvx12c.securemail.intermedia.net [64.78.52.187]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id C221423016 for ; Thu, 21 Jan 2016 02:43:22 +0000 (UTC) Received: from securemail.intermedia.net (localhost [127.0.0.1]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by emg-ca-1-2.localdomain (Postfix) with ESMTPS id 2A69853E8E for ; Wed, 20 Jan 2016 18:43:21 -0800 (PST) Subject: Re: Hadoop encryption module as Apache Chimera incubator project MIME-Version: 1.0 x-echoworx-msg-id: fb476702-9d49-4ea2-b476-e49f89a0fd5f x-echoworx-emg-received: Wed, 20 Jan 2016 18:43:21.096 -0800 x-echoworx-message-code-hashed: f4e4d7a9230335aa6078408a95803038cc38627174dc6e3f5bac263b36d60363 x-echoworx-action: delivered Received: from 10.254.155.17 ([10.254.155.17]) by emg-ca-1-2 (JAMES SMTP Server 2.3.2) with SMTP ID 718 for ; Wed, 20 Jan 2016 18:43:21 -0800 (PST) Received: from MBX080-W4-CO-1.exch080.serverpod.net (unknown [10.224.117.101]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by emg-ca-1-2.localdomain (Postfix) with ESMTPS id DD17353E8E for ; Wed, 20 Jan 2016 18:43:20 -0800 (PST) Received: from MBX080-W4-CO-2.exch080.serverpod.net (10.224.117.102) by MBX080-W4-CO-1.exch080.serverpod.net (10.224.117.101) with Microsoft SMTP Server (TLS) id 15.0.1130.7; Wed, 20 Jan 2016 18:43:19 -0800 Received: from MBX080-W4-CO-2.exch080.serverpod.net ([10.224.117.102]) by mbx080-w4-co-2.exch080.serverpod.net ([10.224.117.102]) with mapi id 15.00.1130.005; Wed, 20 Jan 2016 18:43:19 -0800 From: Larry McCay To: "hdfs-dev@hadoop.apache.org" Thread-Topic: Hadoop encryption module as Apache Chimera incubator project Thread-Index: AQHRUo2FRYvQ78cdYEW15gf7dom5bJ8Fp0iAgAAUQYCAAAl8gIAABumA Date: Thu, 21 Jan 2016 02:43:19 +0000 Message-ID: References: <8D5F7E3237B3ED47B84CF187BB17B66614874DD0@SHSMSX103.ccr.corp.intel.com> In-Reply-To: <8D5F7E3237B3ED47B84CF187BB17B66614874DD0@SHSMSX103.ccr.corp.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-messagesentrepresentingtype: 1 x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [108.24.174.213] x-source-routing-agent: Processed Content-Type: text/plain; charset="Windows-1252" Content-ID: Content-Transfer-Encoding: quoted-printable That=92s a good point, Kai. If what we are looking for is some level of autonomy then it would need to = be a module with its own release train - or at least be able to. On Jan 20, 2016, at 9:18 PM, Zheng, Kai wrote: > Just a question. Becoming a separate jar/module in Apache Commons means C= himera or the module can be released separately or in a timely manner, not = coupling with other modules for release in the project? Thanks. >=20 > Regards, > Kai >=20 > -----Original Message----- > From: Aaron T. Myers [mailto:atm@cloudera.com]=20 > Sent: Thursday, January 21, 2016 9:44 AM > To: hdfs-dev@hadoop.apache.org > Subject: Re: Hadoop encryption module as Apache Chimera incubator project >=20 > +1 for Hadoop depending upon Chimera, assuming Chimera can get > hosted/released under some Apache project umbrella. If that's Apache Comm= ons (which makes a lot of sense to me) then I'm also a big +1 on Andrew's s= uggestion that we make it a separate module. >=20 > Uma, would you be up for approaching the Apache Commons folks saying that= you'd like to contribute Chimera? I'd recommend saying that Hadoop and Spa= rk are both on board to depend on this. >=20 > -- > Aaron T. Myers > Software Engineer, Cloudera >=20 > On Wed, Jan 20, 2016 at 4:31 PM, Andrew Wang > wrote: >=20 >> Thanks Uma for putting together this proposal. Overall sounds good to=20 >> me, >> +1 for these improvements. A few comments/questions: >>=20 >> * If it becomes part of Apache Commons, could we make Chimera a=20 >> separate JAR? We have real difficulties bumping dependency versions=20 >> right now, so ideally we don't need to bump our existing Commons=20 >> dependencies to use Chimera. >> * With this refactoring, do we have confidence that we can get our=20 >> desired changes merged and released in a timely fashion? e.g. if we=20 >> find another bug like HADOOP-11343, we'll first need to get the fix=20 >> into Chimera, have a new Chimera release, then bump Hadoop's Chimera=20 >> dependency. This also relates to the previous point, it's easier to do= =20 >> this dependency bump if Chimera is a separate JAR. >>=20 >> Best, >> Andrew >>=20 >> On Mon, Jan 18, 2016 at 11:46 PM, Gangumalla, Uma <=20 >> uma.gangumalla@intel.com> >> wrote: >>=20 >>> Hi Devs, >>>=20 >>> Some of our Hadoop developers working with Spark community to=20 >>> implement the shuffle encryption. While implementing that, they=20 >>> realized some/most >> of >>> the code in Hadoop encryption code and their implemention code have=20 >>> to >> be >>> duplicated. This leads to an idea to create separate library, named=20 >>> it as Chimera (https://github.com/intel-hadoop/chimera). It is an=20 >>> optimized cryptographic library. It provides Java API for both=20 >>> cipher level and >> Java >>> stream level to help developers implement high performance AES=20 >>> encryption/decryption with the minimum code and effort. Chimera was=20 >>> originally based Hadoop crypto code but was improved and generalized=20 >>> a >> lot >>> for supporting wider scope of data encryption needs for more=20 >>> components >> in >>> the community. >>>=20 >>> So, now team is thinking to make this library code as open source=20 >>> project via Apache Incubation. Proposal is Chimera to join the=20 >>> Apache as incubating or Apache commons for facilitating its adoption. >>>=20 >>> In general this will get the following advantages: >>> 1. As Chimera embedded the native in jar (similar to Snappy java),=20 >>> it solves the current issues in Hadoop that a HDFS client has to=20 >>> depend libhadoop.so if the client needs to read encryption zone in=20 >>> HDFS. This means a HDFS client may has to depend a Hadoop=20 >>> installation in local machine. For example, HBase uses depends on=20 >>> HDFS client jar other than a Hadoop installation and then has no=20 >>> access to libhadoop.so. So HBase >> cannot >>> use an encryption zone or it cause error. >>> 2. Apache Spark shuffle and spill encryption could be another=20 >>> example where we can use Chimera. We see the fact that the stream=20 >>> encryption for Spark shuffle and spill doesn=92t require a stream=20 >>> cipher like AES/CTR, although the code shares the common=20 >>> characteristics of a stream style >> API. >>> We also see the need of optimized Cipher for non-stream style use=20 >>> cases such as network encryption such as RPC. These improvements=20 >>> actually can >> be >>> shared by more projects of need. >>>=20 >>> 3. Simplified code in Hadoop to use dedicated library. And drives=20 >>> more improvements. For example, current the Hadoop crypto code API=20 >>> is totally based on AES/CTR although it has cipher suite configurations= . >>>=20 >>> AES/CTR is for HDFS data encryption at rest, but it doesn=92t=20 >>> necessary to be AES/CTR for all the cases such as Data transfer=20 >>> encryption and intermediate file encryption. >>>=20 >>>=20 >>>=20 >>> So, we wanted to check with Hadoop community about this proposal.=20 >>> Please provide your feedbacks on it. >>>=20 >>> Regards, >>> Uma >>>=20 >>=20