hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eli Collins <...@cloudera.com>
Subject Re: [VOTE] Maintain a single committer list for the Hadoop project
Date Tue, 28 Aug 2012 23:57:11 GMT
On Tue, Aug 28, 2012 at 4:12 PM, Arun C Murthy <acm@hortonworks.com> wrote:
> On Aug 23, 2012, at 9:20 PM, Eli Collins wrote:
>> Per this thread [1] should we have a single set of committers for the
>> entire Hadoop project, ie all subprojects?
> I feel like we need to have a wider discussion here.
> This discussion started when a diverse set of folks working on YARN for a year and a
half wanted their own identity and an acknowledgement of the fact that they are a distinct
community. In retrospect, I went about convincing the wider Hadoop community about this in
the wrong way. My apologies.
> Upon reflection, I think Chris Mattman has convinced me that we have an even wider issue
at hand and that the right way to a better place, not just for YARN, but for all of Hadoop,
is to expedite the process of graduating Hadoop sub-projects into TLPs. This is a mere reflection
of the fact that Hadoop is not a single community.
> Historically there have been at least 2 communities (HDFS, MapReduce) under the Hadoop
umbrella; and there now 3 (HDFS, MapReduce, YARN).
> At least for the last 3 years, if not more, the overwhelming majority of contributors
to Hadoop have focussed exclusively on one of the sub-projects. That is a clear indicator.
> This is exactly the thinking behind graduating former sub-projects like HBase, Hive &
Pig graduating, upon the nudge received by the Hadoop PMC from the Board.
> The good news is that, in principle, most seem to agree on the need for Hadoop sub-projects
to stand alone and the path to get there. It could lead to several great outcomes such as
ensuring HDFS pays equal attention to HBase as MapReduce, YARN pays attention to projects
beyond MapReduce etc. by not tying them together.
> Rather than sweep this under the carpet, I feel we are better off acknowledging this.
> This is very much in keeping with the way the ASF and the Board wants to see communities
- small and focussed on a single project.
> A meta or umbrella community like Hadoop leads to issues which are well documented and
understood in the ASF, something experienced Apache Members like Chris Mattman have repeatedly
pointed out.
> It is also fair, per Chris Douglas, to set a reasonable time frame. After due consideration,
I think doing this before hadoop-2 is declared stable (GA) is the most reasonable option.
It gives us necessary headroom hereupon and will ensure we don't confuse users further by
doing it post-fact hadoop-2. Let's discuss the mechanics, timelines etc. further.
> Yes, this is hard work and there are several technical challenges. But, the ASF is all
about communities and I'm sure we can solve these technical issues for a better long-term
health of these distinct communities.
> Thoughts?

I'd start a separate discussion thread or vote about moving some or
all of the sub-projects to TLPs. IMO we should resolve this issue
independently - there's no reason to block this decision on a possible
future direction for the project. For example if YARN spins out as a
TLP this issue still remains for the rest of the sub-projects, so I
don't want to stall progress on this on the larger more complex
discussion of whether all projects become TLPs. And if a sub-project
spins out as a TLP that's a great opportunity to figure out the right
set of committers. Ie the decision here doesn't prevent YARN from
establishing a new committer lists if/when it spins out.


View raw message