hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhere
Date Mon, 31 Jan 2011 11:47:32 GMT
On 31/01/11 03:42, Nigel Daley wrote:
> Folks,
>
> Now that http://apache-extras.org is launched (https://blogs.apache.org/foundation/entry/the_apache_software_foundation_launches)
I'd like to start a discussion on moving contrib components out of common, mapreduce, and
hdfs.
>
> These contrib components complicate the builds, cause test failures that nobody seems
to care about, have releases that are tied to Hadoop's long release cycles, etc.  Most folks
I've talked with agree that these contrib components would be better served by being pulled
out of Hadoop and hosted elsewhere. The new apache-extras code hosting site seems like a natural
*default* location for migrating these contrib projects.  Perhaps some should graduate from
contrib to src (ie from contrib to core of the project they're included in).  If folks agree,
we'll need to come up with a mapping of contrib component to it's final destination and file
a jira.
>
> Here are the contrib components by project (hopefully I didn't miss any).
>
> Common Contrib:
>    failmon
>    hod
>    test
>
>
> MapReduce Contrib:
>    capacity-scheduler -- move to MR core?
>    data_join
>    dynamic-scheduler
>    eclipse-plugin
>    fairscheduler -- move to MR core?
>    gridmix
>    index
>    mrunit
>    mumak
>    raid
>    sqoop
>    streaming -- move to MR core?
>    vaidya
>    vertica
>

+1 for the schedulers in core
+1 for streaming


For the "accessories",they are really separate projects that work on 
with Hadoop, but could have separate release schedules

  -move them to incubation, try and staff them.
  -if they aren't resourced, then that means they are dead code

I'm -1 to having any support for filesystems other than Posix and HDFS 
in there, =0 on S3, but it's used widely enough it should stay in, 
especially as amazon do apparently provide some funding for testing.

Because, as nigel points out, testing is the enemy. If you don't have 
the implementation of the filesystem in question, there is no way to be 
sure that some change works, you can't use it, release it saying "it 
works", or field bug reports.

Testing and releasing of filesystem interfaces should be the 
responsibility of the filesystem suppliers or whoever wants to develop 
the bridge from the FS to Hadoop.

This raises another issue which I've been thinking of recently, how do 
you define "compatibility". If, for example, my colleagues and I were to 
stand up say "our FS is compatible with Apache Hadoop", what does that 
mean?

-Steve

Mime
View raw message