hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jesse Yates <jesse.k.ya...@gmail.com>
Subject Re: Thoughts about large feature dev branches
Date Wed, 05 Sep 2012 23:43:22 GMT
On Wed, Sep 5, 2012 at 3:58 PM, Elliott Clark <eclark@stumbleupon.com>wrote:

> +1 on git, either on github or closer to the linux model with real
> distributed repos.
> - I've been using it for just about all of my development and it works
> pretty nicely.  I push everything to github as I'm working.  Then I
> squash commits and create a diff to post on jira.

I do the same, just locally. Solid model.

> - I would suggest that since hbase's code base moves so rapidly, a
> rebased branch should probably be a requirement before merging.
> Otherwise the merge will get pretty interesting for very long lived
> branches.

IIRC when Todd was working on some large stuff for HDFS he was doing this
in a feature branch every few days. Seriously helps with when things are
actually finished in terms of rolling it back in.

Using github to keep a constantly rebased version (every few days) would be
a reasonble, super-low friction way of solving the problem for
non-committers. Further, for big changes, it would ensure that if the
people go away we aren't left with a bunch of dangling branches in the svn.
Problem here is also establishing the 'master' branch in github, though
that can be established on a case-by-case basis with the people involved.

> On Wed, Sep 5, 2012 at 11:38 AM, Jonathan Hsieh <jon@cloudera.com> wrote:
> > This has been brought up in the past but we are here again.
> >
> > We have a few large features that are hanging out and having a hard time
> > because trunk changes underneath it and in some cases because they are
> > being worked by folks without a commit bit.   (ex: snapshots w/ Jesse and
> > Matteo, and have some other potentially in the pipeline -- major
> assignment

I'm generally opposed to doing feature branches for a variety of reasons
(left behind functionality, hard to roll back in, difficulty of testing,
etc) and further don't really feel its really necessary for the snapshot
code given that the code doesn't touch all that much of the current

A lot of the pain with it right now is that the code has been broken into 5
patches, making it hard to build a version of HBase that has snapshots 'in
its current form'. This gets even worse as I'm planning on doing a bit more
refactoring into a couple more patches to help make it more digestable
(e.g. see latest patch for 3PC https://reviews.apache.org/r/6592/ which
pulls out a lot of the coordination functionality)). This helps with
reviews, etc, but makes it a bit of a pain for people who want to do
advanced testing on the feature - hard to justify doing a lot of that work
though as if the code is changing a lot, then testing doesn't make much

In terms of how the work is breaking down, with Matteo doing restore on top
of the taking that I'm working on, his part clearly depends on the taking
of snapshots. However, the filesystem layout hasn't changed at all in
nearly the last two months, meaning the work can proceed pretty much
independently (more or less).

> > manager changes with Jimmy and possibly me,

This is a lot more high-touch with the codebase, making a branch (either in
sandbox or otherwise) more feasible.

>  HBASE-4120, HBASE-2600,
> > removing root)

Salesforce is planning on tackling at least the latter two in the next few
months, so this is something that we need to figure out :)

>  >
> > Though I wasn't around yet, it seems like this is what we did for
> > coprocs/security, probably for the 0.90 master.
> >
> http://search-hadoop.com/m/byzZYZMktx1/hbase+windows&subj=Re+Proposed+feature+branch+for+HBase+security
> >
> > Where the folks working on those features committers at the time?  What
> do
> > we do for contributions from folks who aren't committers yet?
> >
> > This was proposed over on hadoop-general by Todd -- what do you all think
> > about doing something like this for the major changes?  (Github seems
> > easiest, svn seems "more official").
> >
> > Here's one proposal, making use of git as an easy way to allow
> > non-committers to "commit" code while still tracking development in
> > the usual places:
> > - Upon anyone's request, we create a new "Version" tag in JIRA.
> > - The developers create an umbrella JIRA for the project, and file the
> > individual work items as subtasks (either up front, or as they are
> > developed if using a more iterative model)
> > - On the umbrella, they add a pointer to a git branch to be used as
> > the staging area for the branch. As they develop each subtask, they
> > can use the JIRA to discuss the development like they would with a
> > normally committed JIRA, but when they feel it is ready to go (not
> > requiring a +1 from any committer) they commit to their git branch
> > instead of the SVN repo.
> > - When the branch is ready to merge, they can call a merge vote, which
> > requires +1 from 3 committers, same as a branch being proposed by an
> > existing committer. A committer would then use git-svn to merge their
> > branch commit-by-commit, or if it is less extensive, simply generate a
> > single big patch to commit into SVN.

Overall, this seems reasonable. I can imagine the work to merge back in
being a huge pain. It would be great to see if we can break down these big
changes into smaller patches and roll them in one at a time. Both in terms
of ease on a single committer as helping to ensure code quality of each
sub-piece; its easier to enforce good testing on smaller pieces and helps
with code reuse.

My comments above obviously contradict this a little bit - its a huge pain
to work on the end functionality when the sub-pieces that you are building
on shift due to code reviews. In the end it leads to a better foundation,
but can be headache to keep everything in sync.

The latter goes away a bit if we have a single branch with the majority of
the code then progressive commits to fix things, but still is terrible to
review (pot calling the kettle black here) that first massive code drop.

TL;DR prefer smaller, independently useful patches that build to the bigger
change. Its may not be possible for some features, but should make it
easier to review, roll in, and in the end merge the final change while
being more generally useful.

> > Another alternative, if people are reluctant to use git, would be to
> > add a "sandbox/" repository inside our SVN, and hand out commit bit to
> > branches inside there without any PMC vote. Anyone interested in
> > contributing could request a branch in the sandbox, and be granted
> > access as soon as they get an apache SVN account.
> >

This seems a little excessive. It would be nice for the more 'official'
status this confers,  but seems to create more friction than its worth

TL;DR github with 'official' branches per umbrella JIRA seems a
low-friction way to do feature branches without the possiblitly of cruft in
the main repository. We should really be sure that we need a branch though
and still favoring smaller patches along the same branch for generally
useful features.

Jesse Yates

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message