hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Sammer <esam...@cloudera.com>
Subject Re: [VOTE] Shall we adopt the "Defining Hadoop" page
Date Thu, 16 Jun 2011 06:35:52 GMT
On Wed, Jun 15, 2011 at 9:47 PM, Ian Holsman <hadoop@holsman.net> wrote:

> so yes .. even a simple patch makes it derived, because it is different.

...and a "dervied work" is fine. Nothing inherently wrong with the term
derived. I think the question is can one call it "Hadoop?" Note I'm *not*
saying "Apache Hadoop," just "Hadoop" when the derived work is actually
derived (to any degree, as Craig R pointed out). Apache Hadoop always and
forever means the bits voted on by the PMC - no vendor can claim that - but
there does appear to be plenty of prior examples of "reasonable" use of ASF
(and other OSS organization) project names in clearly derived works. I do
agree there should be a policy and it needs to be universally applied to be
fair to all involved.

Not to kick up the compatibility dust storm again, but people will always
claim crazy stuff that may or may not be true. We should just ignore it. Any
day of the week someone is claiming XYZ compatible either explicitly or
implicitly (as in client libraries for Foo Project). For cases where a
vendor makes a claim that isn't true, users will ask, we'll clarify that
Apache makes no guarantees of derived work compatibility and doesn't certify
anything (and specifically does the opposite - *NO* guarantees or

Example uses I think should be fine / acceptable:

YDH (even though it no longer exists, it's a good example) and Y!'s use of
Facebook Hadoop
Hadoop at eBay
Hadoop at LinkedIn
IBM's use of Hadoop
and yes, CDH*

Even if some / all of the above modify at least a single bit (and may
*technically* be derived works) everyone understands what they mean. As for
the confusion, the OSS community has always just said "oh, they patch some
stuff, you should probably ask them" when confronted with vendor modified
versions of upstream projects; I've been involved in many of those upstream
projects, including a Linux distro (downstream). We should always be polite
to downstream users in redirecting them, but I think redirecting them is
fine. It's not confusing to users in my experience (we can make it a FAQ or
something and just point people there) as RedHat, Novell, Oracle, IBM, and
many other vendors have been happily[1] coexisting with their upstream
counterparts for a long time.

I believe we (the collective Apache Hadoop community including those that
redistribute Hadoop bits in various forms) should focus on producing
regular, quality releases in a cooperative and constructive environment, and
continue to require vendors to provide the proper attribution and license
information. This is in everyone's interest, vendors and direct users alike.

*Disclosure: I work for Cloudera and I think this should apply to anyone and
everyone, including my employer (with whom I obviously do not clear emails.

[1] OK, maybe not always "happily" but mostly so. You know what I mean.

Thanks to Steve L and others for their hard work on this one.
(Sorry for the long email.)

Eric Sammer
twitter: esammer
data: www.cloudera.com

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message