hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Boudnik <...@apache.org>
Subject Re: Proposal: Further Project Split(s)
Date Fri, 01 Apr 2011 18:41:54 GMT
On Fri, Apr 1, 2011 at 08:26, Nigel Daley <ndaley@mac.com> wrote:
> -1+2.  This could potentially allow us to replace Jenkins with Hadoop for our build
and test infrastructure.  That would be awesome!

Has anyone checked a calendar lately?

> On Apr 1, 2011, at 1:57 AM, Chris Douglas wrote:
>> Experience developing Hadoop has shown that we not only need to
>> partition our projects for more active releases, but we also should
>> explore speculative project splits. For this, a Hadoop.next() project
>> should track the development of a project scheduler that can partition
>> the Hadoop subprojects, possibly running a second version of a
>> subproject in parallel. Downstream subprojects and TLPs automatically
>> accept whichever releases first as a dependency. Implementation should
>> combine ant, ivy, maven, and at least one legacy Hadoop build tool (to
>> be written).
>> Of course, not all of these subprojects will succeed. When one fails
>> (or is too slow with its project reports), the project scheduler will
>> be responsible for respawning it in the Incubator.
>> The project scheduler will, of course, be pluggable. -C
>> On Fri, Apr 1, 2011 at 1:19 AM, Aaron T. Myers <atm@cloudera.com> wrote:
>>> Hello Hadoop Community,
>>> Given the tremendous positive feedback we've all had regarding the HDFS,
>>> MapReduce, and Common project split, I'd like to propose we take the next
>>> step and further separate the existing projects.
>>> I propose we begin by splitting the MapReduce project into separate "Map"
>>> and "Reduce" sub-projects. This will provide us the opportunity to tease out
>>> the complex interdependencies between "map" and "reduce" that exist today,
>>> to encourage us to write more modular and isolated code, which should speed
>>> releases. This will also aid our users who exclusively run map-only or
>>> reduce-only jobs. These are important use-cases, and so should be given high
>>> priority.
>>> Given that these two portions of the existing MapReduce project share a
>>> great deal of code, we will likely need to release these two new projects
>>> concurrently at first, but the eventual goal should certainly be to be able
>>> to release "Map" and "Reduce" independently. This seems intuitive to me,
>>> given the remarkable recent advancements in the academic community regarding
>>> "reduce," while the research coming out of the "map" academics has largely
>>> stagnated of late.
>>> If this proposal is accepted, and it has the success I think it will, then
>>> we should strongly consider splitting the other two projects as well. My gut
>>> instinct is that we should split "HDFS" into "HD" and "FS" sub-projects, and
>>> simply rename the "Common" project to "C'Mon." We can think about the
>>> details of what exactly these project splits mean later.
>>> Please let me know what you think.
>>> Best,
>>> Aaron

View raw message