hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Angeles <patr...@cloudera.com>
Subject Re: Proposal: Further Project Split(s)
Date Fri, 01 Apr 2011 16:13:13 GMT
+1

This will allow Hadoop to better compete with GoDaddy's "Hadoop Killer"
skunkworks project.

On Fri, Apr 1, 2011 at 11:26 AM, Nigel Daley <ndaley@mac.com> wrote:

> -1+2.  This could potentially allow us to replace Jenkins with Hadoop for
> our build and test infrastructure.  That would be awesome!
>
> n.
>
> On Apr 1, 2011, at 1:57 AM, Chris Douglas wrote:
>
> > Experience developing Hadoop has shown that we not only need to
> > partition our projects for more active releases, but we also should
> > explore speculative project splits. For this, a Hadoop.next() project
> > should track the development of a project scheduler that can partition
> > the Hadoop subprojects, possibly running a second version of a
> > subproject in parallel. Downstream subprojects and TLPs automatically
> > accept whichever releases first as a dependency. Implementation should
> > combine ant, ivy, maven, and at least one legacy Hadoop build tool (to
> > be written).
> >
> > Of course, not all of these subprojects will succeed. When one fails
> > (or is too slow with its project reports), the project scheduler will
> > be responsible for respawning it in the Incubator.
> >
> > The project scheduler will, of course, be pluggable. -C
> >
> > On Fri, Apr 1, 2011 at 1:19 AM, Aaron T. Myers <atm@cloudera.com> wrote:
> >> Hello Hadoop Community,
> >>
> >> Given the tremendous positive feedback we've all had regarding the HDFS,
> >> MapReduce, and Common project split, I'd like to propose we take the
> next
> >> step and further separate the existing projects.
> >>
> >> I propose we begin by splitting the MapReduce project into separate
> "Map"
> >> and "Reduce" sub-projects. This will provide us the opportunity to tease
> out
> >> the complex interdependencies between "map" and "reduce" that exist
> today,
> >> to encourage us to write more modular and isolated code, which should
> speed
> >> releases. This will also aid our users who exclusively run map-only or
> >> reduce-only jobs. These are important use-cases, and so should be given
> high
> >> priority.
> >>
> >> Given that these two portions of the existing MapReduce project share a
> >> great deal of code, we will likely need to release these two new
> projects
> >> concurrently at first, but the eventual goal should certainly be to be
> able
> >> to release "Map" and "Reduce" independently. This seems intuitive to me,
> >> given the remarkable recent advancements in the academic community
> regarding
> >> "reduce," while the research coming out of the "map" academics has
> largely
> >> stagnated of late.
> >>
> >> If this proposal is accepted, and it has the success I think it will,
> then
> >> we should strongly consider splitting the other two projects as well. My
> gut
> >> instinct is that we should split "HDFS" into "HD" and "FS" sub-projects,
> and
> >> simply rename the "Common" project to "C'Mon." We can think about the
> >> details of what exactly these project splits mean later.
> >>
> >> Please let me know what you think.
> >>
> >> Best,
> >> Aaron
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message