hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers" <...@cloudera.com>
Subject Re: Hadoop encryption module as Apache Chimera incubator project
Date Fri, 29 Jan 2016 01:50:59 GMT
On Wed, Jan 27, 2016 at 11:31 AM, Owen O'Malley <omalley@apache.org> wrote:

> I believe encryption is becoming a core part of Hadoop. I think that moving
> core components out of Hadoop is bad from a project management perspective.

Although it's certainly true that encryption capabilities (in HDFS, YARN,
etc.) are becoming core to Hadoop, I don't think that should really
influence whether or not the non-Hadoop-specific encryption routines should
be part of the Hadoop code base, or part of the code base of another
project that Hadoop depends on. If Chimera had existed as a library hosted
at ASF when HDFS encryption was first developed, HDFS probably would have
just added that as a dependency and been done with it. I don't think we
would've copy/pasted the code for Chimera into the Hadoop code base.

> To put it another way, a bug in the encryption routines will likely become
> a security problem that security@hadoop needs to hear about.
I don't think
> adding a separate project in the middle of that communication chain is a
> good idea. The same applies to data corruption problems, and so on...

Isn't the same true of all the libraries that Hadoop currently depends
upon? If the commons-httpclient library (or commons-codec, or commons-io,
or guava, or...) has a security vulnerability, we need to know about it so
that we can update our dependency to a fixed version. This case doesn't
seem materially different than that.

> > It may be good to keep at generalized place(As in the
> > discussion, we thought that place could be Apache Commons).
> Apache Commons is a collection of *Java* projects, so Chimera as a
> JNI-based library isn't a natural fit.

Could very well be that Apache Commons's charter would preclude Chimera.
You probably know better than I do about that.

> Furthermore, Apache Commons doesn't
> have its own security list so problems will go to the generic
> security@apache.org.

That seems easy enough to remedy, if they wanted to, and besides I'm not
sure why that would influence this discussion. In my experience projects
that don't have a separate security@project.a.o mailing list tend to just
handle security issues on their private@project.a.o mailing list, which
seems fine to me.

> Why do you think that Apache Commons is a better home than Hadoop?

I'm certainly not at all wedded to Apache Commons, that just seemed like a
natural place to put it to me. Could be that a brand new TLP might make
more sense.

I *do* think that if other non-Hadoop projects want to make use of Chimera,
which as I understand it is the goal which started this thread, then
Chimera should exist outside of Hadoop so that:

a) Projects that have nothing to do with Hadoop can just depend directly on
Chimera, which has nothing Hadoop-specific in there.

b) The Hadoop project doesn't have to export/maintain/concern itself with
yet another publicly-consumed interface.

c) Chimera can have its own (presumably much faster) release cadence
completely separate from Hadoop.

Aaron T. Myers
Software Engineer, Cloudera

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message