hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhere
Date Tue, 01 Feb 2011 19:54:08 GMT
On Tue, Feb 1, 2011 at 9:37 AM, Tom White <tom@cloudera.com> wrote:

> HBase moved all its contrib components out of the main tree a few
> months back - can anyone comment how that worked out?
Sure. For each contrib:

ec2: no longer exists, and now has been integrated into Whirr and much
improved. Whirr has made several releases in the time that HBase has made
one. The whirr contributors know way more about cloud deployment than the
HBase contributors (except where they happen to overlap). Strong net

mdc_replication: pulled into core since it's developed by core committers
and also needs a fair amount of tight integration with core components

stargate: pulled into core - it was only in contrib as a sort of staging
ground - it's really an improved/new version of the "rest" interface we
already had in core.

transactional: moved to github - this has languished a bit on github because
only one person was actively maintaining it. However, it had already been
"languishing" as part of contrib - even though it compiled, it never really
worked very well in HBase trunk. So, moving it to a place where it's
languished has just made it more obvious what was already true - that it
isn't a well supported component (yet). Recently it's been taken back up by
the author of it - if it develops a large user base it can move quickly and
evolve without waiting on our release. Net: probably a wash

So, overall, I'd say it was a good decision. Though we never had the same
number of contribs that Hadoop seems to have sprouted.


> On Tue, Feb 1, 2011 at 1:02 AM, Allen Wittenauer
> <awittenauer@linkedin.com> wrote:
> >
> > On Jan 31, 2011, at 3:23 PM, Todd Lipcon wrote:
> >
> >> On Sun, Jan 30, 2011 at 11:19 PM, Owen O'Malley <omalley@apache.org>
> wrote:
> >>
> >>>
> >>> Also note that pushing code out of Hadoop has a high cost. There are at
> >>> least 3 forks of the hadoop-gpl-compression code. That creates a lot of
> >>> confusion for the users. A lot of users never go to the work to figure
> out
> >>> which fork and branch of hadoop-gpl-compression work with the version
> of
> >>> Hadoop they installed.
> >>>
> >>>
> >> Indeed it creates confusion, but in my opinion it has been very
> successful
> >> modulo that confusion.
> >
> >        I'm not sure how the above works with what you wrote below:
> >
> >> In particular, Kevin and I (who each have a repo on github but basically
> >> co-maintain a branch) have done about 8 bugfix releases of LZO in the
> last
> >> year. The ability to take a bug and turn it around into a release within
> a
> >> few days has been very beneficial to the users. If it were part of core
> >> Hadoop, people would be forced to live with these blocker bugs for
> months at
> >> a time between dot releases.
> >
> >        So is the expectation that users would have to follow bread crumbs
> to the github dumping ground, then try to figure out which repo is the
> 'better' choice for their usage?   Using LZO as an example, it appears we
> have a choice of kevin's, your's, or the master without even taking into
> consideration any tags. That sounds like a recipe for disaster that's even
> worse than what we have today.
> >
> >
> >> IMO the more we can take non-core components and move them to separate
> >> release timelines, the better. Yes, it is harder for users, but it also
> is
> >> easier for them when they hit a bug - they don't have to wait months for
> a
> >> wholesale upgrade which might contain hundreds of other changes to core
> >> components.
> >
> >        I'd agree except for one thing:  even when users do provide
> patches to contrib components we ignore them.  How long have those patches
> for HOD been sitting there in the patch queue?  So of course they wait
> months/years--because we seemingly ignore anything that isn't important to
> us.  Unfortunately, that covers a large chunk of contrib. :(
> >
> >
> >

Todd Lipcon
Software Engineer, Cloudera

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message