hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: [VOTE] Shall we adopt the "Defining Hadoop" page
Date Wed, 15 Jun 2011 18:13:27 GMT
+1 to what Eli says.  If nobody is running official Hadoop according to this
definition, but everybody thinks that they are running hadoop, then this
definition is a bit out of whack.  The source of the dissonance is related
to the fact that release just don't happen often enough in Hadoop.

In addition, I think that the limitations on usage are too strict.  For
instance, if "QuickBooks for Windows" [1] doesn't cause Microsoft to sue
Intuit, then "Joe's Foo for Apache Hadoop" really shouldn't cause any more

So I would give a (non-binding) -1 to the policy as stated.


On Wed, Jun 15, 2011 at 6:40 PM, Eli Collins <eli@cloudera.com> wrote:

> On Tue, Jun 14, 2011 at 7:45 PM, Owen O'Malley <omalley@apache.org> wrote:
> >
> > On Jun 14, 2011, at 5:48 PM, Eli Collins wrote:
> >
> >> Wrt derivative works, it's not clear from the document, but I think we
> >> should explicitly adopt the policy of HTTPD and Subversion that
> >> backported patches from trunk and security fixes are permitted.
> >
> > Actually, the document is extremely clear that only Apache releases may
> be called Hadoop.
> >
> > There was a very long thread about why the rapidly expanding
> Hadoop-ecosystem is leading to at lot of customer confusion about the
> different "versions" of Hadoop. We as the Hadoop project don't have the
> resources or the necessary compatibility test suite to test compatibility
> between the different sets of cherry picked patches. We also don't have time
> to ensure that all of the 1,000's of patches applied to 0.20.2 in each of
> the many (10? 15?) different versions have been committed to trunk.
> Futhermore, under the Apache license, a company Foo could claim that it is a
> cherry pick version of Hadoop without releasing their source code that would
> enable verification.
> >
> > In summary,
> >  1. Hadoop is very successful.
> >  2. There are many different commercial products that are trying to use
> the Hadoop name.
> >  3. We can't check or enforce that the cherry pick versions are following
> the rules.
> >  4. We don't have a TCK like Java does to validate new versions are
> compatible.
> >  5. By far the most fair way to ensure compatibility and fairness between
> companies is that only Apache Hadoop releases may be called Hadoop.
> >
> > That said, a package that includes a small number (< 3) of security
> patches that haven't been released yet doesn't seem unreasonable.
> >
> I've spoken with ops teams at many companies,  I am not aware of
> anyone who runs an official release (with just 2 security patches). By
> this definition many of the most valuable contributors to Hadoop,
> including Yahoo!, Cloudera, Facebook, etc are not using Hadoop.  Is
> that really the message we want to send? We expect the PMC to enforce
> this equally across all parties?
> It's a fact of life that companies and ops teams that support Hadoop
> need to patch the software before the PMC has time and/or will to vote
> on new releases. This is why HTTP and Subversion allow this. Putting a
> build of Hadoop that has 4 security patches applied into the same
> category as a product that has entirely re-worked the code and not
> gotten it checked into trunk does a major disservice to the people who
> contribute to and invest in the project.
> Thanks,
> Eli

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message