hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eli Collins <...@cloudera.com>
Subject Re: Large feature development
Date Sun, 02 Sep 2012 19:47:13 GMT
On Sun, Sep 2, 2012 at 7:58 AM, Steve Loughran <steve.loughran@gmail.com> wrote:
> On 1 September 2012 09:20, Todd Lipcon <todd@cloudera.com> wrote:
>> Thanks for starting this thread, Steve. I think your points below are
>> good. I've snipped most of your comment and will reply inline to one
>> bit below:
>> On Fri, Aug 31, 2012 at 10:07 AM, Steve Loughran
>> <steve.loughran@gmail.com> wrote:
>> >
>> > How then do we get (a) more dev projects working and integrated by the
>> > current committers, and (b) a process in which people who are not yet
>> > contributors/committers can develop non-trivial changes to the project
>> in a
>> > way that it is done with the knowledge, support and mentorship of the
>> rest
>> > of the community?
> Both HDFS2 and MRv2 are in trunk, therefore I consider them successes.
>> Here's one proposal, making use of git as an easy way to allow
>> non-committers to "commit" code while still tracking development in
>> the usual places:
> This is effectively what people do. I'm less worried about the code side of
> things than the integration and mentoring
>> - Upon anyone's request, we create a new "Version" tag in JIRA.
> -1. There are enough versions. There is a "tag" field in JIRA for precisely
> this purpose
>> - The developers create an umbrella JIRA for the project, and file the
>> individual work items as subtasks (either up front, or as they are
>> developed if using a more iterative model)
> as today
>> - On the umbrella, they add a pointer to a git branch to be used as
>> the staging area for the branch. As they develop each subtask, they
>> can use the JIRA to discuss the development like they would with a
>> normally committed JIRA, but when they feel it is ready to go (not
>> requiring a +1 from any committer) they commit to their git branch
>> instead of the SVN repo.
> some integration w/ jenkins and pull testing would be good here
>> - When the branch is ready to merge, they can call a merge vote, which
>> requires +1 from 3 committers, same as a branch being proposed by an
>> existing committer. A committer would then use git-svn to merge their
>> branch commit-by-commit, or if it is less extensive, simply generate a
>> single big patch to commit into SVN.
>> My thinking is that this would provide a low-friction way for people
>> to collaborate with the community and develop in the open, without
>> having to work closely with any committer to review every individual
>> subtask.
>> Another alternative, if people are reluctant to use git, would be to
>> add a "sandbox/" repository inside our SVN, and hand out commit bit to
>> branches inside there without any PMC vote. Anyone interested in
>> contributing could request a branch in the sandbox, and be granted
>> access as soon as they get an apache SVN account.
> I don't see the technical issues with how the merge is done as the main
> problem.
> The barriers to getting your stuff in are
> 1. getting people to care enough to help develop the feature -mentorship,
> collaborative development.
> 2. getting incremental parts in to avoid the continual
> merge-regression-test hell that you go through if you are trying to keep a
> separate branch alive. It's not the technical aspects of the merge so much
> as the need to run all the hadoop tests and your own test suite, and track
> down whether a failure is a regression in -trunk or something in your code.
> Jun's patch is an example of this situation. We haven't seen the effort he
> and his colleagues have done with merge and test, but I'm confident it's
> been there. What they now have is a "big bang" class of patch which is so
> big that anyone reviewing it would have to spend a couple of weeks going
> through the codebase trying to understand it. Which as we all know means
> two weeks not doing all the things you are committed to doing.
> We know it's there, we know it's current -so how to use this as an exercise
> in something to pull in incrementally?

Jun's patches from HADOOP-8468 (which were developed on a private
github repo) are being pulled in incrementally into trunk, there's no
feature branch (which I think would have been a better route but at
least the current approach has not prevented some progress).

All the recent examples of features that I can think of that have been
developed upstream first at Apache on feature branches have gone well.


View raw message