hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhere
Date Tue, 01 Feb 2011 19:46:35 GMT
On Tue, Feb 1, 2011 at 1:02 AM, Allen Wittenauer
<awittenauer@linkedin.com>wrote:

>
>
>         So is the expectation that users would have to follow bread crumbs
> to the github dumping ground, then try to figure out which repo is the
> 'better' choice for their usage?   Using LZO as an example, it appears we
> have a choice of kevin's, your's, or the master without even taking into
> consideration any tags. That sounds like a recipe for disaster that's even
> worse than what we have today.
>
>
Kevin's and mine are currently identical
(0e7005136e4160ed4cc157c4ddd7f4f1c6e11ffa)

Not sure who "the master" is -- maybe you're referring to the Google Code
repo? The reason we started working on github over a year ago is that the
bugs we reported (and provided diffs for) in the Google Code project were
ignored. For example:
http://code.google.com/p/hadoop-gpl-compression/issues/detail?id=17

In fact this repo hasn't been updated since Sep '09:
http://code.google.com/p/hadoop-gpl-compression/source/list

Github provided an excellent place to collaborate on the project, make
progress, fix bugs, and provide a better product for the users.

As for "dumping ground," I don't quite follow your point - we develop in the
open, accept pull requests from users, and code review each others' changes.
Since October every commit has either been contributed by or fixes a bug
reported by a user completely outside of the organizations where Kevin and I
work.

I agree that it's a bit of "breadcrumb following" to find the repo, though.
We do at least have a link on the wiki:
http://wiki.apache.org/hadoop/UsingLzoCompression which points to Kevin's
repo.

Perhaps the best solution here is to add a page to the official Hadoop site
(not just the wiki) with links to actively maintained contrib projects?


>
> > IMO the more we can take non-core components and move them to separate
> > release timelines, the better. Yes, it is harder for users, but it also
> is
> > easier for them when they hit a bug - they don't have to wait months for
> a
> > wholesale upgrade which might contain hundreds of other changes to core
> > components.
>
>         I'd agree except for one thing:  even when users do provide patches
> to contrib components we ignore them.  How long have those patches for HOD
> been sitting there in the patch queue?  So of course they wait
> months/years--because we seemingly ignore anything that isn't important to
> us.  Unfortunately, that covers a large chunk of contrib. :(
>

True - we ignore them because the core contributors generally have little
clue about the contrib components, so don't feel qualified to review. I'll
happily admit that I've never run failmon, index, dynamic-scheduler,
eclipse-plugin, data_join, mumak, or vertica contribs. Wouldn't you rather
these components lived on github so the people who wrote them could update
them as they wished without having to wait on committers who have little to
no clue about how to evaluate the changes?

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message