hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom White <...@cloudera.com>
Subject Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhere
Date Tue, 01 Feb 2011 17:37:10 GMT
+1 for the reasons already cited: independent release cycles,
testing/build problems, lack of maintenance, etc. I think we should
strongly discourage new contrib components in favour of Apache Extras
or github, remove inactive contrib components, and also allow
maintainers to move components out if they volunteer to.

HBase moved all its contrib components out of the main tree a few
months back - can anyone comment how that worked out?

I agree that we should move streaming (MAPREDUCE-602) and the
schedulers to the main codebase. With work like MAPREDUCE-1478 we can
put these components into a library tree so that the libraries can
depend on core, but core doesn't depend on the libraries.

Milind: Record IO is in Common (in the main tree, not a contrib
component), and was deprecated in 0.21.0. We could remove it in a
future release.


On Tue, Feb 1, 2011 at 1:02 AM, Allen Wittenauer
<awittenauer@linkedin.com> wrote:
> On Jan 31, 2011, at 3:23 PM, Todd Lipcon wrote:
>> On Sun, Jan 30, 2011 at 11:19 PM, Owen O'Malley <omalley@apache.org> wrote:
>>> Also note that pushing code out of Hadoop has a high cost. There are at
>>> least 3 forks of the hadoop-gpl-compression code. That creates a lot of
>>> confusion for the users. A lot of users never go to the work to figure out
>>> which fork and branch of hadoop-gpl-compression work with the version of
>>> Hadoop they installed.
>> Indeed it creates confusion, but in my opinion it has been very successful
>> modulo that confusion.
>        I'm not sure how the above works with what you wrote below:
>> In particular, Kevin and I (who each have a repo on github but basically
>> co-maintain a branch) have done about 8 bugfix releases of LZO in the last
>> year. The ability to take a bug and turn it around into a release within a
>> few days has been very beneficial to the users. If it were part of core
>> Hadoop, people would be forced to live with these blocker bugs for months at
>> a time between dot releases.
>        So is the expectation that users would have to follow bread crumbs to the
github dumping ground, then try to figure out which repo is the 'better' choice for their
usage?   Using LZO as an example, it appears we have a choice of kevin's, your's, or the
master without even taking into consideration any tags. That sounds like a recipe for disaster
that's even worse than what we have today.
>> IMO the more we can take non-core components and move them to separate
>> release timelines, the better. Yes, it is harder for users, but it also is
>> easier for them when they hit a bug - they don't have to wait months for a
>> wholesale upgrade which might contain hundreds of other changes to core
>> components.
>        I'd agree except for one thing:  even when users do provide patches to contrib
components we ignore them.  How long have those patches for HOD been sitting there in the
patch queue?  So of course they wait months/years--because we seemingly ignore anything that
isn't important to us.  Unfortunately, that covers a large chunk of contrib. :(

View raw message