hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: Fwd: [VOTE] Shall we adopt the "Defining Hadoop" page
Date Mon, 20 Jun 2011 16:39:26 GMT
Hi Jeff,

First, apologies for removing most of your argument for clarity. Readers can find it in the
general@ archives I am sure.

> Lastly, I'd love to learn more about how other prominent open source
> projects have approached this issue. If you have any knowledge about
> how Linux handled the use of its trademark, please add your
> thoughts to
> http://www.quora.com/What-are-the-rules-for-using-the-Linux-trademark-in-a-product-name.
> Because Apache Hadoop is a kernel technology, similar to Linux, I
> suspect there are many useful lessons to learn. Or at least crazy
> email threads to read.

I would argue the concern about trademark has an additional dimension here, and perhaps a
fairly core additional motivation to protect, because these are open source projects. The
mention of Linux helps to illustrate it.

The obvious difference between Hadoop and Linux is Linux has a universally recognized clear
hierarchy with a single -- and exceptional, and quickly and forcefully opinionated -- authority
at the top. For Linux, the power to define Linux rests obviously with Linus. Regarding Hadoop,
the power to do anything, including define what is Hadoop, is diffuse.

For would-be open source participants who want to contribute to the Linux kernel, the canonical
source of the Linux kernel is clearly Linus' tree and you want your contribution to end up
there. He is the authority. Linux will always be defined by Linus until he is gone. (That
is a long term problem for Linux of course.) It is a benevolent dictatorship that perhaps
uniquely works, allowing enough contributors to see the fruits of their labor to sustain it
while simultaneously maintaining a strong identity. 

Hadoop has no equivalent.

Linux, for now at least, can be quite liberal in how the Linux mark is used because of how
its identity as a project is defined, therefore its ability to attract contributions.

Hadoop I think needs to be more careful. What triggered this discussion is the arrival of
new players releasing products they call Hadoop but containing severe changes the community,
by way of the ASF umbrella we all work under, had nothing to do with designing or developing.
And some of these are being open sourced as a Hadoop. There is no Linus here. Which of these
is _the_ Hadoop? As a would-be contributor, which should I select?

Already we have some issues. In some cases I'd rather contribute to Cloudera sources because
at least I know my contribution to CDH will see a timely release.

Furthermore, I believe the extent to which users see value in ASF Hadoop, and have a clear
definition of what ASF Hadoop is, will be correlated with the extent to which the ASF can
attract enough contributions to Hadoop to sustain innovation against competing technologies.

The open source value proposition "I contribute to Hadoop" impacts the long term survival
of the project. Individuals and organizations are both motivated by this, for various reasons.

   - Andy

View raw message