hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <...@hortonworks.com>
Subject Re: [VOTE] Maintain a single committer list for the Hadoop project
Date Tue, 28 Aug 2012 23:12:22 GMT
On Aug 23, 2012, at 9:20 PM, Eli Collins wrote:

> Per this thread [1] should we have a single set of committers for the
> entire Hadoop project, ie all subprojects?

I feel like we need to have a wider discussion here.

This discussion started when a diverse set of folks working on YARN for a year and a half
wanted their own identity and an acknowledgement of the fact that they are a distinct community.
In retrospect, I went about convincing the wider Hadoop community about this in the wrong
way. My apologies.

Upon reflection, I think Chris Mattman has convinced me that we have an even wider issue at
hand and that the right way to a better place, not just for YARN, but for all of Hadoop, is
to expedite the process of graduating Hadoop sub-projects into TLPs. This is a mere reflection
of the fact that Hadoop is not a single community.

Historically there have been at least 2 communities (HDFS, MapReduce) under the Hadoop umbrella;
and there now 3 (HDFS, MapReduce, YARN).
At least for the last 3 years, if not more, the overwhelming majority of contributors to Hadoop
have focussed exclusively on one of the sub-projects. That is a clear indicator.
This is exactly the thinking behind graduating former sub-projects like HBase, Hive &
Pig graduating, upon the nudge received by the Hadoop PMC from the Board.

The good news is that, in principle, most seem to agree on the need for Hadoop sub-projects
to stand alone and the path to get there. It could lead to several great outcomes such as
ensuring HDFS pays equal attention to HBase as MapReduce, YARN pays attention to projects
beyond MapReduce etc. by not tying them together.

Rather than sweep this under the carpet, I feel we are better off acknowledging this.

This is very much in keeping with the way the ASF and the Board wants to see communities -
small and focussed on a single project.

A meta or umbrella community like Hadoop leads to issues which are well documented and understood
in the ASF, something experienced Apache Members like Chris Mattman have repeatedly pointed

It is also fair, per Chris Douglas, to set a reasonable time frame. After due consideration,
I think doing this before hadoop-2 is declared stable (GA) is the most reasonable option.
It gives us necessary headroom hereupon and will ensure we don't confuse users further by
doing it post-fact hadoop-2. Let's discuss the mechanics, timelines etc. further.

Yes, this is hard work and there are several technical challenges. But, the ASF is all about
communities and I'm sure we can solve these technical issues for a better long-term health
of these distinct communities.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message