Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hbase.apache.org
Received-SPF: pass (nike.apache.org: domain of todd@cloudera.com designates
 209.85.212.173 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAB5sDNKmTAzOPHjSe_jkaOGVC5a9o6Pmas6VVqRTEVmbeXCK0A@mail.gmail.com>
References: 
 <CAAha9a0E3s7BaFNY5msr6a7AA-v2tOpYfD6zxU5y1oBseE-VJg@mail.gmail.com>
 <CAKYwJ9z=3j18zBVQHmtnddp0fUHj04dD2q9DRshwSovBXkgLoA@mail.gmail.com>
 <CAB5sDNKmTAzOPHjSe_jkaOGVC5a9o6Pmas6VVqRTEVmbeXCK0A@mail.gmail.com>
From: Todd Lipcon <todd@cloudera.com>
Date: Wed, 5 Sep 2012 16:48:22 -0700
Message-ID: 
 <CADY20s4N7kAk7dORDkE6y_oFiArsbc0mW+MmG6vttQjgoJ6vbw@mail.gmail.com>
Subject: Re: Thoughts about large feature dev branches
To: dev@hbase.apache.org
Content-Type: text/plain; charset=ISO-8859-1

Hope to have time to write up some more thoughts later, but some
interesting reading is this document from Linux on how to contribute
to that project:
https://github.com/mirrors/linux-2.6/blob/master/Documentation/SubmittingPatches

Worth looking at other projects' guidelines to form our own if we're
thinking of going this route.

-Todd

On Wed, Sep 5, 2012 at 4:43 PM, Jesse Yates <jesse.k.yates@gmail.com> wrote:
> On Wed, Sep 5, 2012 at 3:58 PM, Elliott Clark <eclark@stumbleupon.com>wrote:
>
>> +1 on git, either on github or closer to the linux model with real
>> distributed repos.
>>
>> - I've been using it for just about all of my development and it works
>> pretty nicely.  I push everything to github as I'm working.  Then I
>> squash commits and create a diff to post on jira.
>>
>
> I do the same, just locally. Solid model.
>
>
>> - I would suggest that since hbase's code base moves so rapidly, a
>> rebased branch should probably be a requirement before merging.
>> Otherwise the merge will get pretty interesting for very long lived
>> branches.
>>
>
> IIRC when Todd was working on some large stuff for HDFS he was doing this
> in a feature branch every few days. Seriously helps with when things are
> actually finished in terms of rolling it back in.
>
> Using github to keep a constantly rebased version (every few days) would be
> a reasonble, super-low friction way of solving the problem for
> non-committers. Further, for big changes, it would ensure that if the
> people go away we aren't left with a bunch of dangling branches in the svn.
> Problem here is also establishing the 'master' branch in github, though
> that can be established on a case-by-case basis with the people involved.
>
>>
>> On Wed, Sep 5, 2012 at 11:38 AM, Jonathan Hsieh <jon@cloudera.com> wrote:
>> > This has been brought up in the past but we are here again.
>> >
>> > We have a few large features that are hanging out and having a hard time
>> > because trunk changes underneath it and in some cases because they are
>> > being worked by folks without a commit bit.   (ex: snapshots w/ Jesse and
>> > Matteo, and have some other potentially in the pipeline -- major
>> assignment
>>
>
> I'm generally opposed to doing feature branches for a variety of reasons
> (left behind functionality, hard to roll back in, difficulty of testing,
> etc) and further don't really feel its really necessary for the snapshot
> code given that the code doesn't touch all that much of the current
> codebase.
>
> A lot of the pain with it right now is that the code has been broken into 5
> patches, making it hard to build a version of HBase that has snapshots 'in
> its current form'. This gets even worse as I'm planning on doing a bit more
> refactoring into a couple more patches to help make it more digestable
> (e.g. see latest patch for 3PC https://reviews.apache.org/r/6592/ which
> pulls out a lot of the coordination functionality)). This helps with
> reviews, etc, but makes it a bit of a pain for people who want to do
> advanced testing on the feature - hard to justify doing a lot of that work
> though as if the code is changing a lot, then testing doesn't make much
> sense.
>
> In terms of how the work is breaking down, with Matteo doing restore on top
> of the taking that I'm working on, his part clearly depends on the taking
> of snapshots. However, the filesystem layout hasn't changed at all in
> nearly the last two months, meaning the work can proceed pretty much
> independently (more or less).
>
>
>> > manager changes with Jimmy and possibly me,
>>
>
> This is a lot more high-touch with the codebase, making a branch (either in
> sandbox or otherwise) more feasible.
>
>
>>  HBASE-4120, HBASE-2600,
>> > removing root)
>>
>
> Salesforce is planning on tackling at least the latter two in the next few
> months, so this is something that we need to figure out :)
>
>
>>  >
>> > Though I wasn't around yet, it seems like this is what we did for
>> > coprocs/security, probably for the 0.90 master.
>> >
>> http://search-hadoop.com/m/byzZYZMktx1/hbase+windows&subj=Re+Proposed+feature+branch+for+HBase+security
>> >
>> > Where the folks working on those features committers at the time?  What
>> do
>> > we do for contributions from folks who aren't committers yet?
>> >
>> > This was proposed over on hadoop-general by Todd -- what do you all think
>> > about doing something like this for the major changes?  (Github seems
>> > easiest, svn seems "more official").
>> >
>> > Here's one proposal, making use of git as an easy way to allow
>> > non-committers to "commit" code while still tracking development in
>> > the usual places:
>> > - Upon anyone's request, we create a new "Version" tag in JIRA.
>> > - The developers create an umbrella JIRA for the project, and file the
>> > individual work items as subtasks (either up front, or as they are
>> > developed if using a more iterative model)
>> > - On the umbrella, they add a pointer to a git branch to be used as
>> > the staging area for the branch. As they develop each subtask, they
>> > can use the JIRA to discuss the development like they would with a
>> > normally committed JIRA, but when they feel it is ready to go (not
>> > requiring a +1 from any committer) they commit to their git branch
>> > instead of the SVN repo.
>> > - When the branch is ready to merge, they can call a merge vote, which
>> > requires +1 from 3 committers, same as a branch being proposed by an
>> > existing committer. A committer would then use git-svn to merge their
>> > branch commit-by-commit, or if it is less extensive, simply generate a
>> > single big patch to commit into SVN.
>>
>
> Overall, this seems reasonable. I can imagine the work to merge back in
> being a huge pain. It would be great to see if we can break down these big
> changes into smaller patches and roll them in one at a time. Both in terms
> of ease on a single committer as helping to ensure code quality of each
> sub-piece; its easier to enforce good testing on smaller pieces and helps
> with code reuse.
>
> My comments above obviously contradict this a little bit - its a huge pain
> to work on the end functionality when the sub-pieces that you are building
> on shift due to code reviews. In the end it leads to a better foundation,
> but can be headache to keep everything in sync.
>
> The latter goes away a bit if we have a single branch with the majority of
> the code then progressive commits to fix things, but still is terrible to
> review (pot calling the kettle black here) that first massive code drop.
>
> TL;DR prefer smaller, independently useful patches that build to the bigger
> change. Its may not be possible for some features, but should make it
> easier to review, roll in, and in the end merge the final change while
> being more generally useful.
>
>
>>
>> > Another alternative, if people are reluctant to use git, would be to
>> > add a "sandbox/" repository inside our SVN, and hand out commit bit to
>> > branches inside there without any PMC vote. Anyone interested in
>> > contributing could request a branch in the sandbox, and be granted
>> > access as soon as they get an apache SVN account.
>> >
>>
>
> This seems a little excessive. It would be nice for the more 'official'
> status this confers,  but seems to create more friction than its worth
> (IMO).
>
>
> TL;DR github with 'official' branches per umbrella JIRA seems a
> low-friction way to do feature branches without the possiblitly of cruft in
> the main repository. We should really be sure that we need a branch though
> and still favoring smaller patches along the same branch for generally
> useful features.
>
> -------------------
> Jesse Yates
> @jesse_yates
> jyates.github.com


-- 
Todd Lipcon
Software Engineer, Cloudera