hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: [VOTE] Shall we adopt the "Defining Hadoop" page
Date Thu, 16 Jun 2011 11:46:46 GMT
On 16/06/11 07:35, Eric Sammer wrote:
> On Wed, Jun 15, 2011 at 9:47 PM, Ian Holsman<hadoop@holsman.net>  wrote:
>> so yes .. even a simple patch makes it derived, because it is different.
> ...and a "dervied work" is fine. Nothing inherently wrong with the term
> derived. I think the question is can one call it "Hadoop?" Note I'm *not*
> saying "Apache Hadoop," just "Hadoop" when the derived work is actually
> derived (to any degree, as Craig R pointed out). Apache Hadoop always and
> forever means the bits voted on by the PMC - no vendor can claim that - but
> there does appear to be plenty of prior examples of "reasonable" use of ASF
> (and other OSS organization) project names in clearly derived works. I do
> agree there should be a policy and it needs to be universally applied to be
> fair to all involved.
> Not to kick up the compatibility dust storm again, but people will always
> claim crazy stuff that may or may not be true. We should just ignore it.

The issue is branding and trademarks, eventually things get downgraded 
to become meaningless. If I code an MR engine in erlang (I have one 
somewhere), can I call it "Hadoop for Erlang"?

 > Any
> day of the week someone is claiming XYZ compatible either explicitly or
> implicitly (as in client libraries for Foo Project). For cases where a
> vendor makes a claim that isn't true, users will ask, we'll clarify that
> Apache makes no guarantees of derived work compatibility and doesn't certify
> anything (and specifically does the opposite - *NO* guarantees or
> warranties).

-BigTop could provide that defensible compatibility statement. 
"Automotive Joe's Crankshaft platform passed the Apache BigTop DFS, MR, 
Mahout and HBase test suites"

> Example uses I think should be fine / acceptable:
> YDH (even though it no longer exists, it's a good example) and Y!'s use of
> Hadoop

-creates confusion and encourages the notion that anything is a 
distribution of hadoop, which is the situation that the trademarks 
people are trying to crack down

> Facebook Hadoop

-depends on internal vs external

> Hadoop at eBay
> Hadoop at LinkedIn

details of internal use, as valid as "Hadoop in Steve's house", which, 
given my known network state, is always something to cherish. And while 
I have built my branch up and published it, it's no longer something I 
distribute (though it is in an open SVN repository somewhere). I'm 
working directly with Apache Hadoop 0.20.203 these days.

> IBM's use of Hadoop

not sure about IBM distribution of Apache Hadoop, as I presume it has 
the uncommitted patch to work on IBM JVMs (though were someone to commit 


The biginsights product is more explicit and, to me, a good example of 
terminology. Their own brand, description of the benefits, and details 
on what's in there:

"IBM InfoSphere BigInsights Enterprise Edition
For turning complex, internet-scale information into insight, cost 

IBM® InfoSphere™ BigInsights Enterprise Edition enables new solutions 
that turn large, complex volumes of data into insight, cost effectively. 
InfoSphere BigInsights delivers an enterprise-ready big data solution by 
combining Apache Hadoop, including the MapReduce framework and the 
Hadoop Distributed File Systems (HDFS), with unique technologies and 
capabilities from across IBM."

That gives them the flexibility to swap things around in future (switch 
to GPFS, MapR, Brisk) without having to change their branding.

> and yes, CDH*

If you look a the CDH site its now "Cloudera's Distribution including 
Apache Hadoop". After all it's Cloudera's data analysis stack including 
Apache Hadoop,

> Even if some / all of the above modify at least a single bit (and may
> *technically* be derived works) everyone understands what they mean. As for
> the confusion, the OSS community has always just said "oh, they patch some
> stuff, you should probably ask them" when confronted with vendor modified
> versions of upstream projects; I've been involved in many of those upstream
> projects, including a Linux distro (downstream). We should always be polite
> to downstream users in redirecting them, but I think redirecting them is
> fine. It's not confusing to users in my experience (we can make it a FAQ or
> something and just point people there) as RedHat, Novell, Oracle, IBM, and
> many other vendors have been happily[1] coexisting with their upstream
> counterparts for a long time.

co-existence yes; happiness, not always:


Where ubuntu are good is that launchpad is a good entry point for filing 
and tracking any ubuntu-related problem, and helping to push that 
upstream, so the local issue can be linked to the source issue, letting 
me deal with problems like getting sound to work:

JIRA doesn't do that cross-instance tracking, which is painful for me at 
work, where I do deal with multiple JIRA instances. You can put remote 
URLs in, but they don't get synchronised. I can't say SFOS-780 depends 
on apache.org/MAPREDUCE-279, for example.

> I believe we (the collective Apache Hadoop community including those that
> redistribute Hadoop bits in various forms) should focus on producing
> regular, quality releases in a cooperative and constructive environment, and
> continue to require vendors to provide the proper attribution and license
> information. This is in everyone's interest, vendors and direct users alike.


> *Disclosure: I work for Cloudera and I think this should apply to anyone and
> everyone, including my employer (with whom I obviously do not clear emails.
> :))

I understand -it may ultimately affect my employer too. Which is why a 
consistent approach matters, then nobody will feel they are being 
discriminated against.

View raw message