hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (388J)" <chris.a.mattm...@jpl.nasa.gov>
Subject Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project
Date Wed, 29 Aug 2012 17:22:40 GMT
Hi Bobby,

On Aug 29, 2012, at 8:17 AM, Robert Evans wrote:

> I personally am for splitting up the projects.  I think there is a lot of
> potential that each of the projects could have on their own, and I expect
> to see them evolve in new and interesting ways when the projects are not
> tied directly together.
> But, in order to get there we need to address the issues that made the
> first split attempt fail.  

Sorry I snipped the above but mainly I just don't buy the argument that
there is a bunch of technical things that *block* splitting the projects.

Today, right now, I could propose a new Incubator project, and call it
BoooDoopADoop. I could add 5-7 (or 4) people that I think I would work
well with. I could invite others to join in the Incubator as part of the
initial PPMC list and committer list. We could write in our proposal 
that the existing Hadoop community is technically amazing, but over time
has been mired by a bunch of community issues and we'd like to take
our crack at the source code in a brand new Apache project called

Then for the code portion of the Incubator proposal, I could say, I will
svn copy all of Hadoop into BooDoopADoop and then start from there.

So, given that I could do that (as could others), I would also have to 
readily be prepared for the community bad-will and general ASF 
bad-will that may cause. It may not cause ASF bad-will, b/c in general
the foundation doesn't care about competing projects or technologies.
It does care about splintering communities and the like though. Moreover,
beyond the Foundation concerns, I would also have to concern myself
with pissing you guys off, and all the downstream organizations and 
companies and individuals that are part of the Hadoop ecosystem 
that may be pissed off about the way we injected code into 
BooDoopADoop. But again, nothing stopping me from doing that.

I'd like to point out in the above scenario, I don't have to worry about
releasing schedules, and this, or that, and the other. Or APIs, or whatever.
I have BooDoopADoop, and so does the new community around it in the
Incubator, and we simply "go". Then, if others upstream, or downstream
find BooDoopADoop useful, they take it, and then incorporate it into 
their project. Perhaps Hadoop HDFS finds our improvements to BooDoopADoop
and its distributed file system better and perhaps we did some Maven magic
and made our jar file better or more attractive to use and it saved Hadoop HDFS
coding, and time and whatever. So Hadoop HDFS integrates it. 

See how this could work?

So, take me out of BooDoopADooop and replace that with the Hadoop
PMC, and the specific subsets of you guys that are actually really distinct
PMC members of distinct communities living within the Hadoop ecosystem.
Sure you want to technically work together on releases, and APIs, and whatever,
but those are, *inter-community* issues, more so than *intra-community* across
the Foundation. Sure, it's good to try and coordinate, b/c you guys all have $dayjobs,
and the software you build at those $dayjobs is contributed upstream into the 
ASF, and then others depend on it (and then others downstream of the ASF and 
even downstream of your companies, depend on it, and so on and so forth). However,
as far as the foundation is concerned, communities, and projects (1:1 ideally) 
coordinate releases on an inter-community-level, not intra-*. the intra-* is usually
just icing and way more difficult.

> As part of this we also need to have a clear set of rules about what it
> takes to become a committer or PMC member for the new projects when they
> split off.  I am fine with all committers become PMC members,

+1 me too, and your suggestion below about "if we merge..." is one option
to doing so. But there could be others and discussing them and putting 
them up on a list is probably a good idea.

I would honestly suggest someone(s) taking a stab at the lists of the new
PMC members for the new TLPs and then putting something out there, 
and then -'ing people or adding them, as needed.

And yes, I fully agree, that the PMC lists should not simply be the 
full Hadoop PMC per new TLP -- then we've just replicated the inherent
problem 3x over instead of 1x over :) 

However, I don't know the ins and outs enough of who those lists should 
be for HDFS, MR and YARN. I bet you guys do though, so someone, step up
and throw something out there for others to shoot down....errr I mean improve! :)

> I fear that just voting and doing an svn copy -m will result in the same
> thing that happened last time.  Someone will want to make a large change.
> This will require making a change to something in common,

See my BooDoopADoop. I don't think that someone in new TLP X wanting
to make a change in their copy of common will matter to TLP Y. It shouldn't.
It *can*, over time, if there is coordination between X and Y, but it doesn't
have to. Get what I mean?

> [...snip...]

> I also want us to think about the timing of this.  Do we really want to do
> this before 2.0 is GA?  Doing this properly is probably going to be a
> several month effort for one or two people, and a concerted effort by
> everyone not to break things while they work.

This is *not* a technical issue :) This is a community issue. It's independent
of the technical issues. This is about how to fix the community issues. 

But yes, if you guys want to release some upcoming version first or whatever
fine, and dandy if the community agrees, but it shouldn't be a gate to fixing
community issues.

This happens in the Incubator all the time. The big question with a project
releasing and then having a graduation VOTE near that release (before or
after) -- do we wait to graduate? I'm always a fan of just moving forward on
graduation b/c it's independent of the technical stuff.

> @Chris,
> I can see your desire to do the split now, and then deal with the fallout
> as we adapt to the changes.  I think that would work assuming that we all
> are completely committed to making the changes necessary. But because we
> are having this discussion at all seems to indicate that we are not all
> completely committed to this, and I also feel that dealing with the
> fallout is going to take a lot longer if we don't try to address some of
> the problems up front.

Dealing with Hadoop technical problems is probably not my forte anymore 
(if it ever was : ) ). I'm here as a Foundation member trying to help
with the community problems.

>  Putting on my Yahoo! Hat, I want to avoid as many
> problems and delays as I can, because my customers want a stable release
> of Hadoop the features that are in 2.0.  The longer it is delayed the
> longer we stay on branch-0.23.  A one quarter delay because of this I am
> sure I can swing, more then that and I will start to get more pressure to
> pull in new features which will probably mean that we then have to fork
> which is something that I really do not want to do.

In the end, forking is what you guys should do :) You should just do it
at Apache. "Fork" the current Hadoop uber project into the actual communities
that actually exist. You can fork directly out as TLPs, or incubate the forks. 
But doing it here would be great :)

> So I am +1 on merging the committer list, and +1 splitting the projects.
> I would encourage us to at least do some planning and legwork up front
> before splitting.  I am even +1 for setting a deadline on which date svn
> -m will happen wether we are ready or not.

Thanks for your thoughts Bobby. Hope that explains where I am coming



Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA

View raw message