hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley" <omal...@apache.org>
Subject What is "Hadoop?" Was: Defining Hadoop Compatibility -revisiting-
Date Fri, 13 May 2011 14:41:58 GMT
On Tue, May 10, 2011 at 3:29 AM, Steve Loughran <stevel@apache.org> wrote:

> I think we should revisit this issue before people with their own agendas
> define what compatibility with Apache Hadoop is for us

I agree completely. As you point out, this week we've had a flood of
products calling themselves "Hadoop" or "Distribution of Hadoop" that
include only a part of Hadoop. This is will dilute Apache's Hadoop trademark
and create consumer confusion.

> -Use of the Hadoop codebase must follow the Apache License
> http://www.apache.org/licenses/LICENSE-2.0
> -plug in components that are dynamically linked to (Filesystems and
> schedulers) don't appear to be derivative works on my reading of this,

Plugins are usually considered independent works. Note that the Apache
license does permit commercial closed-source derivative works. A company
could take Hadoop's code, modify it, and sell a binary release as long as
they meet the conditions of the Apache license.

> Naming
>  -this is something for branding@apache, they will have their opinions.
> The key one is that the name "Apache Hadoop" must get used, and it's
> important to make clear it is a derivative work.
>  -I don't think you can claim to have a Distribution/Fork/Version of Apache
> Hadoop if you swap out big chunks of it for alternate filesystems, MR
> engines, etc. Some description of this is needed
> "Supports the Apache Hadoop MapReduce engine on top of Filesystem XYZ"

The Hadoop name is the primary tool that the project has for minimizing
customer confusion. I think we need to create a very clear definition of
what can be called Hadoop and what can not. Apache gives the PMCs a fair
amount of latitude in picking the policy for their project name and I think
we need to do so.

Given the large number of so-called Hadoop products that are being released,
I believe that we should require "Hadoop" to mean specifically the Apache
Hadoop releases (possibly with a few critical security patches).

Projects that are derivative works can either be "powered by Apache Hadoop,"
or "based on Apache Hadoop."

What do others think?

-- Owen

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message