Mailing-List: contact common-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-dev@hadoop.apache.org
MIME-Version: 1.0
In-Reply-To: <A289C5D8-1C98-41C0-BBD3-503D80C724B3@hortonworks.com>
References: 
 <CAHfHakE9mtkGQap50JLsj8tFuEpOxCuva2cwqKdoVchk8xct5w@mail.gmail.com>
	<19E3715B-5663-469B-87A1-153E7B24A5E7@hortonworks.com>
	<CAGB5D2YkX-8a65N2-EyGczDSsp9M1KR575o3ZmGJPYO0XzL28g@mail.gmail.com>
	<CAKKt98SCkfm0nQATxv11m7UggqNbp__1ccqK-UcDB-2eepbeNQ@mail.gmail.com>
	<56332B13.9020904@oss.nttdata.co.jp>
	<CAHfHakGkvEZhTWPc3t10bfFqBkGwpvyfpWNitsrniD665pj+rQ@mail.gmail.com>
	<CA+qbEUNbhQ+Jm726VqZXajkZOOPW0ZqX-0LObb4dBt6++wc_xA@mail.gmail.com>
	<A289C5D8-1C98-41C0-BBD3-503D80C724B3@hortonworks.com>
Date: Sat, 31 Oct 2015 10:58:48 -0700
Message-ID: 
 <CA+qbEUOOczFaU_HcE76g_scO5dR-_eqLKzu5MkiJoY1Q2q5zDA@mail.gmail.com>
Subject: Re: Github integration for Hadoop
From: "Colin P. McCabe" <cmccabe@apache.org>
To: Hadoop Common <common-dev@hadoop.apache.org>
Content-Type: text/plain; charset=UTF-8

Thanks for your responses here.  It sounds like the proposal here is
for doing code reviews on GH, but still doing commits in our existing
way.  Since it wasn't spelled out in the initial proposal, I
interpreted it as doing both reviews and commits on GH, like Spark
does-- which I think is problematic for all the reasons we've
discussed here (the fact that GH introduces merge commits, the
possibility of bypassing jira, duplicate pull requests with no search
features to dedup them, etc. etc.)  Nobody has really come up with a
solution for the problems caused by __committing__ through GH that
scales to our size of community.

If there is a general consensus that __code reviews__ through GH would
be helpful, I will change my -1 to a +0 for that.  But let's make sure
that we are not __commiting__ through GH.  I view this as kind of an
experiment to see how much easier things are this way, so I will try
to keep an open mind.

In parallel with this experiment, I also think we should set up a
gerrit instance that supports code reviews and precommit testing.  As
I said, Cloudera uses gerrit internally and we are very happy with it.
It is nicer than GH because we can set up our own precommit hooks.
For example, we can reject gerrit change requests that don't have a
jira number associated with them.  Gerrit change requests can be
created entirely from the command line as well.  Gerrit is open
source, and doesn't create merge commits for everything if you commit
through it.

I think we can support multiple solutions in parallel and let people
gravitate to the most convenient one, as long as we keep our project
history accessible on JIRA and the mailing lists.  Also, as Andrew
commented, let's make sure we are not setting up duplicate bug
trackers or mailing lists on GH-- one of each of those is enough :)

Colin

On Sat, Oct 31, 2015 at 4:40 AM, Steve Loughran <stevel@hortonworks.com> wrote:
>
>> On 30 Oct 2015, at 17:15, Colin P. McCabe <cmccabe@apache.org> wrote:
>>
>> I think the Spark guys eventually built some kind of UI on top of
>> github to help them search through pull requests.  We would probably
>> also need something like this.
>
> https://spark-prs.appspot.com/users
>
>
> They do have to impose naming scheme on those patches to help identify the area. You can just watch a JIRA and wait for a pull-req to arrive.
>
>>
>> Spark uses github partially because it started as a github project, so
>> everyone was familiar with that.  I haven't seen an answer to Andrew's
>> question about what the value add is here for Hadoop to move to a new
>> system.  I have seen a few comments about a better review UI and
>> one-click patch submission, is that the main goal?
>
> I do think it is good for a very fast cycle time on reviews, though that depends, of course on reviewers willing to put in the time (credit to your colleagues here, Colin).
>
>