hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen, Haifeng" <haifeng.c...@intel.com>
Subject RE: Hadoop encryption module as Apache Chimera incubator project
Date Sat, 30 Jan 2016 02:51:58 GMT
>> I believe encryption is becoming a core part of Hadoop. I think that 
>> moving core components out of Hadoop is bad from a project management perspective.

> Although it's certainly true that encryption capabilities (in HDFS, YARN, etc.) are becoming
core to Hadoop, I don't think that should really influence whether or not the non-Hadoop-specific
encryption routines should be part of the Hadoop code base, or part of the code base of another
project that Hadoop depends on. If Chimera had existed as a library hosted at ASF when HDFS
encryption was first developed, HDFS probably would have just added that as a dependency and
been done with it. I don't think we would've copy/pasted the code for Chimera into the Hadoop
code base.

Agree with ATM. I want to also make an additional clarification. I agree that the encryption
capabilities are becoming core to Hadoop. While this effort is to put common and shared encryption
routines such as crypto stream implementations into a scope which can be widely shared across
the Apache ecosystem. This doesn't move Hadoop encryption out of Hadoop (that is not possible).


Agree if we make it a separate and independent releases project in Hadoop takes a step further
than the existing approach and solve some issues (such as libhadoop.so problem). Frankly speaking,
I think it is not the best option we can try. I also expect that an independent release project
within Hadoop core will also complicate the existing release ideology of Hadoop release. 

Thanks,
Haifeng

-----Original Message-----
From: Aaron T. Myers [mailto:atm@cloudera.com] 
Sent: Friday, January 29, 2016 9:51 AM
To: hdfs-dev@hadoop.apache.org
Subject: Re: Hadoop encryption module as Apache Chimera incubator project

On Wed, Jan 27, 2016 at 11:31 AM, Owen O'Malley <omalley@apache.org> wrote:

> I believe encryption is becoming a core part of Hadoop. I think that 
> moving core components out of Hadoop is bad from a project management perspective.
>

Although it's certainly true that encryption capabilities (in HDFS, YARN,
etc.) are becoming core to Hadoop, I don't think that should really influence whether or not
the non-Hadoop-specific encryption routines should be part of the Hadoop code base, or part
of the code base of another project that Hadoop depends on. If Chimera had existed as a library
hosted at ASF when HDFS encryption was first developed, HDFS probably would have just added
that as a dependency and been done with it. I don't think we would've copy/pasted the code
for Chimera into the Hadoop code base.


> To put it another way, a bug in the encryption routines will likely 
> become a security problem that security@hadoop needs to hear about.
>
I don't think
> adding a separate project in the middle of that communication chain is 
> a good idea. The same applies to data corruption problems, and so on...
>

Isn't the same true of all the libraries that Hadoop currently depends upon? If the commons-httpclient
library (or commons-codec, or commons-io, or guava, or...) has a security vulnerability, we
need to know about it so that we can update our dependency to a fixed version. This case doesn't
seem materially different than that.


>
>
> > It may be good to keep at generalized place(As in the discussion, we 
> > thought that place could be Apache Commons).
>
>
> Apache Commons is a collection of *Java* projects, so Chimera as a 
> JNI-based library isn't a natural fit.
>

Could very well be that Apache Commons's charter would preclude Chimera.
You probably know better than I do about that.


> Furthermore, Apache Commons doesn't
> have its own security list so problems will go to the generic 
> security@apache.org.
>

That seems easy enough to remedy, if they wanted to, and besides I'm not sure why that would
influence this discussion. In my experience projects that don't have a separate security@project.a.o
mailing list tend to just handle security issues on their private@project.a.o mailing list,
which seems fine to me.


>
> Why do you think that Apache Commons is a better home than Hadoop?
>

I'm certainly not at all wedded to Apache Commons, that just seemed like a natural place to
put it to me. Could be that a brand new TLP might make more sense.

I *do* think that if other non-Hadoop projects want to make use of Chimera, which as I understand
it is the goal which started this thread, then Chimera should exist outside of Hadoop so that:

a) Projects that have nothing to do with Hadoop can just depend directly on Chimera, which
has nothing Hadoop-specific in there.

b) The Hadoop project doesn't have to export/maintain/concern itself with yet another publicly-consumed
interface.

c) Chimera can have its own (presumably much faster) release cadence completely separate from
Hadoop.

--
Aaron T. Myers
Software Engineer, Cloudera
Mime
View raw message