Return-Path: X-Original-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 06BCB18B80 for ; Wed, 3 Feb 2016 09:57:06 +0000 (UTC) Received: (qmail 32461 invoked by uid 500); 3 Feb 2016 09:49:45 -0000 Delivered-To: apmail-hadoop-hdfs-dev-archive@hadoop.apache.org Received: (qmail 32363 invoked by uid 500); 3 Feb 2016 09:49:45 -0000 Mailing-List: contact hdfs-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-dev@hadoop.apache.org Delivered-To: mailing list hdfs-dev@hadoop.apache.org Received: (qmail 32352 invoked by uid 99); 3 Feb 2016 09:49:45 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Feb 2016 09:49:45 +0000 Received: from mail-wm0-f45.google.com (mail-wm0-f45.google.com [74.125.82.45]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 31CDB1A00EA for ; Wed, 3 Feb 2016 09:49:44 +0000 (UTC) Received: by mail-wm0-f45.google.com with SMTP id r129so156288242wmr.0 for ; Wed, 03 Feb 2016 01:49:44 -0800 (PST) X-Gm-Message-State: AG10YOSOw65B4F7n/7P/UHoyYXkC99UGK2gCU/Lj2q0kwrbfTgUHOimNmnyKVaXIejueMm5DpBzL9OuZwCgVpg== X-Received: by 10.194.9.200 with SMTP id c8mr734377wjb.63.1454492983658; Wed, 03 Feb 2016 01:49:43 -0800 (PST) MIME-Version: 1.0 Received: by 10.28.214.142 with HTTP; Wed, 3 Feb 2016 01:49:23 -0800 (PST) In-Reply-To: References: <8D5F7E3237B3ED47B84CF187BB17B66614875052@SHSMSX103.ccr.corp.intel.com> <8D5F7E3237B3ED47B84CF187BB17B66614875B9C@SHSMSX103.ccr.corp.intel.com> <3E657120E422654A9EB626F537B8AA91140F4AF3@shsmsx102.ccr.corp.intel.com> <3E657120E422654A9EB626F537B8AA91140F59A8@shsmsx102.ccr.corp.intel.com> <3E657120E422654A9EB626F537B8AA91140F7F34@shsmsx102.ccr.corp.intel.com> <3E657120E422654A9EB626F537B8AA91140F8D8E@shsmsx102.ccr.corp.intel.com> <3E657120E422654A9EB626F537B8AA91140F98DE@shsmsx102.ccr.corp.intel.com> From: Chris Douglas Date: Wed, 3 Feb 2016 01:49:23 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Hadoop encryption module as Apache Chimera incubator project To: "hdfs-dev@hadoop.apache.org" Content-Type: text/plain; charset=UTF-8 On Wed, Feb 3, 2016 at 12:48 AM, Gangumalla, Uma wrote: >>Standing in the point of shared fundamental piece of code like this, I do >>think Apache Commons might be the best direction which we can try as the >>first effort. In this direction, we still need to work with Apache Common >>community for buying in and accepting the proposal. > Make sense. Makes sense how? > For this we should define the independent release cycles for this project > and it would just place under Hadoop tree if we all conclude with this > option at the end. Yes. > [Chris] >>If Chimera is not successful as an independent project or stalls, >>Hadoop and/or Spark and/or $project will have to reabsorb it as >>maintainers. >> > I am not so strong on this point. If we assume project would be > unsuccessful, it can be unsuccessful(less maintained) even under hadoop. > But if other projects depending on this piece then they would get less > support. Of course right now we feel this piece of code is very important > and we feel(expect) it can be successful as independent project, > irrespective of whether it as separate project outside hadoop or inside. > So, I feel this point would not really influence to judge the discussion. Sure; code can idle anywhere, but that wasn't the point I was after. You propose to extract code from Hadoop, but if Chimera fails then what recourse do we have among the other projects taking a dependency on it? Splitting off another project is feasible, but Chimera should be sustainable before this PMC can divest itself of responsibility for security libraries. That's a pretty low bar. Bundling the library with the jar is helpful; I've used that before. It should prefer (updated) libraries from the environment, if configured. Otherwise it's a pain (or impossible) for ops to patch security bugs. -C >>-----Original Message----- >>From: Colin P. McCabe [mailto:cmccabe@apache.org] >>Sent: Wednesday, February 3, 2016 4:56 AM >>To: hdfs-dev@hadoop.apache.org >>Subject: Re: Hadoop encryption module as Apache Chimera incubator project >> >>It's great to see interest in improving this functionality. I think >>Chimera could be successful as an Apache project. I don't have a strong >>opinion one way or the other as to whether it belongs as part of Hadoop >>or separate. >> >>I do think there will be some challenges splitting this functionality out >>into a separate jar, because of the way our CLASSPATH works right now. >>For example, let's say that Hadoop depends on Chimera 1.2 and Spark >>depends on Chimera 1.1. Now Spark jobs have two different versions >>fighting it out on the classpath, similar to the situation with Guava and >>other libraries. Perhaps if Chimera adopts a policy of strong backwards >>compatibility, we can just always use the latest jar, but it still seems >>likely that there will be problems. There are various classpath >>isolation ideas that could help here, but they are big projects in their >>own right and we don't have a clear timeline for them. If this does end >>up being a separate jar, we may need to shade it to avoid all these >>issues. >> >>Bundling the JNI glue code in the jar itself is an interesting idea, >>which we have talked about before for libhadoop.so. It doesn't really >>have anything to do with the question of TLP vs. non-TLP, of course. >>We could do that refactoring in Hadoop itself. The really complicated >>part of bundling JNI code in a jar is that you need to create jars for >>every cross product of (JVM version, openssl version, operating system). >>For example, you have the RHEL6 build for openJDK7 using openssl 1.0.1e. >>If you change any one thing-- say, change openJDK7 to Oracle JDK8, then >>you might need to rebuild. And certainly using Ubuntu would be a >>rebuild. And so forth. This kind of clashes with Maven's philosophy of >>pulling prebuilt jars from the internet. >> >>Kai Zheng's question about whether we would bundle openSSL's libraries is >>a good one. Given the high rate of new vulnerabilities discovered in >>that library, it seems like bundling would require Hadoop users and >>vendors to update very frequently, much more frequently than Hadoop is >>traditionally updated. So probably we would not choose to bundle openssl. >> >>best, >>Colin >> >>On Tue, Feb 2, 2016 at 12:29 AM, Chris Douglas >>wrote: >>> As a subproject of Hadoop, Chimera could maintain its own cadence. >>> There's also no reason why it should maintain dependencies on other >>> parts of Hadoop, if those are separable. How is this solution >>> inadequate? >>> >>> If Chimera is not successful as an independent project or stalls, >>> Hadoop and/or Spark and/or $project will have to reabsorb it as >>> maintainers. Projects have high mortality in early life, and a fight >>> over inheritance/maintenance is something we'd like to avoid. If, on >>> the other hand, it develops enough of a community where it is >>> obviously viable, then we can (and should) break it out as a TLP (as >>> we have before). If other Apache projects take a dependency on >>> Chimera, we're open to adding them to security@hadoop. >>> >>> Unlike Yetus, which was largely rewritten right before it was made >>> into a TLP, security in Hadoop has a complicated pedigree. If Chimera >>> eventually becomes a TLP, it seems fair to include those who work on >>> it while it is a subproject. Declared upfront, that criterion is >>> fairer than any post hoc justification, and will lead to a more >>> accurate account of its community than a subset of the Hadoop >>> PMC/committers that volunteer. -C >>> >>> >>> On Mon, Feb 1, 2016 at 9:29 PM, Chen, Haifeng >>>wrote: >>>> Thanks to all folks providing feedbacks and participating the >>>>discussions. >>>> >>>> @Owen, do you still have any concerns on going forward in the >>>>direction of Apache Commons (or other options, TLP)? >>>> >>>> Thanks, >>>> Haifeng >>>> >>>> -----Original Message----- >>>> From: Chen, Haifeng [mailto:haifeng.chen@intel.com] >>>> Sent: Saturday, January 30, 2016 10:52 AM >>>> To: hdfs-dev@hadoop.apache.org >>>> Subject: RE: Hadoop encryption module as Apache Chimera incubator >>>> project >>>> >>>>>> I believe encryption is becoming a core part of Hadoop. I think >>>>>> that moving core components out of Hadoop is bad from a project >>>>>>management perspective. >>>> >>>>> Although it's certainly true that encryption capabilities (in HDFS, >>>>>YARN, etc.) are becoming core to Hadoop, I don't think that should >>>>>really influence whether or not the non-Hadoop-specific encryption >>>>>routines should be part of the Hadoop code base, or part of the code >>>>>base of another project that Hadoop depends on. If Chimera had existed >>>>>as a library hosted at ASF when HDFS encryption was first developed, >>>>>HDFS probably would have just added that as a dependency and been done >>>>>with it. I don't think we would've copy/pasted the code for Chimera >>>>>into the Hadoop code base. >>>> >>>> Agree with ATM. I want to also make an additional clarification. I >>>>agree that the encryption capabilities are becoming core to Hadoop. >>>>While this effort is to put common and shared encryption routines such >>>>as crypto stream implementations into a scope which can be widely >>>>shared across the Apache ecosystem. This doesn't move Hadoop encryption >>>>out of Hadoop (that is not possible). >>>> >>>> Agree if we make it a separate and independent releases project in >>>>Hadoop takes a step further than the existing approach and solve some >>>>issues (such as libhadoop.so problem). Frankly speaking, I think it is >>>>not the best option we can try. I also expect that an independent >>>>release project within Hadoop core will also complicate the existing >>>>release ideology of Hadoop release. >>>> >>>> Thanks, >>>> Haifeng >>>> >>>> -----Original Message----- >>>> From: Aaron T. Myers [mailto:atm@cloudera.com] >>>> Sent: Friday, January 29, 2016 9:51 AM >>>> To: hdfs-dev@hadoop.apache.org >>>> Subject: Re: Hadoop encryption module as Apache Chimera incubator >>>> project >>>> >>>> On Wed, Jan 27, 2016 at 11:31 AM, Owen O'Malley >>>>wrote: >>>> >>>>> I believe encryption is becoming a core part of Hadoop. I think that >>>>> moving core components out of Hadoop is bad from a project management >>>>>perspective. >>>>> >>>> >>>> Although it's certainly true that encryption capabilities (in HDFS, >>>> YARN, >>>> etc.) are becoming core to Hadoop, I don't think that should really >>>>influence whether or not the non-Hadoop-specific encryption routines >>>>should be part of the Hadoop code base, or part of the code base of >>>>another project that Hadoop depends on. If Chimera had existed as a >>>>library hosted at ASF when HDFS encryption was first developed, HDFS >>>>probably would have just added that as a dependency and been done with >>>>it. I don't think we would've copy/pasted the code for Chimera into the >>>>Hadoop code base. >>>> >>>> >>>>> To put it another way, a bug in the encryption routines will likely >>>>> become a security problem that security@hadoop needs to hear about. >>>>> >>>> I don't think >>>>> adding a separate project in the middle of that communication chain >>>>> is a good idea. The same applies to data corruption problems, and so >>>>>on... >>>>> >>>> >>>> Isn't the same true of all the libraries that Hadoop currently depends >>>>upon? If the commons-httpclient library (or commons-codec, or >>>>commons-io, or guava, or...) has a security vulnerability, we need to >>>>know about it so that we can update our dependency to a fixed version. >>>>This case doesn't seem materially different than that. >>>> >>>> >>>>> >>>>> >>>>> > It may be good to keep at generalized place(As in the discussion, >>>>> > we thought that place could be Apache Commons). >>>>> >>>>> >>>>> Apache Commons is a collection of *Java* projects, so Chimera as a >>>>> JNI-based library isn't a natural fit. >>>>> >>>> >>>> Could very well be that Apache Commons's charter would preclude >>>>Chimera. >>>> You probably know better than I do about that. >>>> >>>> >>>>> Furthermore, Apache Commons doesn't >>>>> have its own security list so problems will go to the generic >>>>> security@apache.org. >>>>> >>>> >>>> That seems easy enough to remedy, if they wanted to, and besides I'm >>>>not sure why that would influence this discussion. In my experience >>>>projects that don't have a separate security@project.a.o mailing list >>>>tend to just handle security issues on their private@project.a.o >>>>mailing list, which seems fine to me. >>>> >>>> >>>>> >>>>> Why do you think that Apache Commons is a better home than Hadoop? >>>>> >>>> >>>> I'm certainly not at all wedded to Apache Commons, that just seemed >>>>like a natural place to put it to me. Could be that a brand new TLP >>>>might make more sense. >>>> >>>> I *do* think that if other non-Hadoop projects want to make use of >>>>Chimera, which as I understand it is the goal which started this >>>>thread, then Chimera should exist outside of Hadoop so that: >>>> >>>> a) Projects that have nothing to do with Hadoop can just depend >>>>directly on Chimera, which has nothing Hadoop-specific in there. >>>> >>>> b) The Hadoop project doesn't have to export/maintain/concern itself >>>>with yet another publicly-consumed interface. >>>> >>>> c) Chimera can have its own (presumably much faster) release cadence >>>>completely separate from Hadoop. >>>> >>>> -- >>>> Aaron T. Myers >>>> Software Engineer, Cloudera >