hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anu Engineer <aengin...@hortonworks.com>
Subject Re: [DISCUSS] Increased use of feature branches
Date Mon, 13 Jun 2016 19:41:44 GMT
Hi Colin,

>Even if everyone used branches for all development, person X might merge
>their branch before person Y, forcing person Y to do a rebase or merge. 
>It is not the presence of absence of branches that causes the need to
>merge or rebase, but the presence of absence of "churn."

You are perfectly right on this technically. The issue is when a 
branch developer gets caught in Commit, Revert, let-us-commit-again, 
oh-it-is-not-fixed-completely, let-us-revert-the-revert cycle. 

I was hoping that branches will be exposed to less of this if everyone 
had private branches and got some time to test and bake the feature 
instead of just directly committing to trunk and then test.

Once again, I agree with your point that in a perfect world, merges should
be about the churn, but trunk is often treated as development branch, 
So my point is that it gets unnecessary churn. I really appreciate the 
thought in the thread - that is - let us be more responsible about how we treat trunk.

> I thought the feature branch merge voting period had been shortened to 5
>days rather than 7?  We should probably spell this out on

Thanks for the link, right now it says 7 days. That is why I assumed it is 7. 
Would you be kind enough to point me to a thread that says it is 5 days for a merge Vote?

I did a google search, but was not able to find a thread like that. Thanks in advance.


On 6/13/16, 11:51 AM, "Colin McCabe" <cmccabe@apache.org> wrote:

>On Sun, Jun 12, 2016, at 05:06, Steve Loughran wrote:
>> > On 10 Jun 2016, at 20:37, Anu Engineer <aengineer@hortonworks.com> wrote:
>> > 
>> > I actively work on two branches (Diskbalancer and ozone) and I agree with most
of what Sangjin said. 
>> > There is an overhead in working with branches, there are both technical costs
and administrative issues 
>> > which discourages developers from using branches.
>> > 
>> > I think the biggest issue with branch based development is that fact that other
developers do not use a branch.
>> > If a small feature appears as a series of commits to “”datanode.java””,
the branch based developer ends up rebasing 
>> > and paying this price of rebasing many times. If everyone followed a model of
branch + Pull request, other branches
>> > would not have to deal with continues rebasing to trunk commits. If we are moving
to a branch based 
>Even if everyone used branches for all development, person X might merge
>their branch before person Y, forcing person Y to do a rebase or merge. 
>It is not the presence of absence of branches that causes the need to
>merge or rebase, but the presence of absence of "churn."
>We try to minimize "churn" in many ways.  For example, we discourage
>people from making trivial whitespace changes to parts of the code
>they're not modifying in their patch.  Or doing things like letting
>their editor change the line ending of files from LF to CR/LF.  However,
>in the final analysis, churn will always exist because development
>> > development, we should probably move to that model for most development to avoid
this tax on people who
>> > actually end up working in the branches.
>> > 
>> > I do have a question in my mind though: What is being proposed is that we move
active development to branches 
>> > if the feature is small or incomplete, however keep the trunk open for check-ins.
One of the biggest reason why we 
>> > check-in into trunk and not to branch-2 is because it is a change that will
break backward compatibility. So do we 
>> > have an expectation of backward compatibility thru the 3.0-alpha series (I personally
vote No, since 3.0 is experimental 
>> > at this stage), but if we decide to support some sort of backward-compact then
willy-nilly committing to trunk 
>> > and still maintaining the expectation we can release Alphas from 3.0 does not
look possible.
>> > 
>> > And then comes the question, once 3.0 becomes official, where do we check-in
a change,  if that would break something? 
>> > so this will lead us back to trunk being the unstable – 3.0 being the new
>I'm not sure I really understand the goal of the "trunk-incompat"
>proposal.  Like Karthik asked earlier in this thread, isn't it really
>just a rename of the existing trunk branch?
>It sounds like the policy is going to be exactly the same as now:
>incompatible stuff in trunk/trunk-incompat/whatever, 3.x compatible
>changes in the 3.x line, 2.x compatible changes in the 2.x line, etc.
>I think we should just create branch-3 and follow the same policy we
>followed with branch-2 and branch-1.  Switching around the names doesn't
>really change the policy, and it creates confusion since it's
>inconsistent with what we did earlier.
>I think one of the big frustrations with trunk is that features sat
>there a while without being released because they weren't compatible
>with branch-2-- the shell script rewrite, for example.  However, this
>reflects a fundamental tradeoff-- either incompatible features can't be
>developed at all in the lifetime of Hadoop 3.x, or we will need
>somewhere to put them.  The trunk-incompat proposal is like saying that
>you've solved the prison overcrowding problem by renaming all prisons to
>"correctional facilities."
>> > 
>> > One more point: If we are moving to use a branch always – then we are looking
at a model similar to using a git + pull 
>> > request model. If that is so would it make sense to modify the rules to make
these branches easier to merge?
>> > Say for example, if all commits in a branch has followed review and checking
policy – just like trunk and commits 
>> > have been made only after a sign off from a committer, would it be possible
to merge with a 3-day voting period 
>> > instead of 7, or treat it just like today’s commit to trunk – but
with 2 people signing-off? 
>I thought the feature branch merge voting period had been shortened to 5
>days rather than 7?  We should probably spell this out on
>https://hadoop.apache.org/bylaws.html .  Like I said above, I don't
>believe that *all* development should be on feature branches, just
>biggish stuff that is likely to be controversial and/or disruptive.  The
>suggestion I made earlier is that if 3 people ask you for a branch, you
>should definitely strongly consider a branch.
>I do think we should shorten the voting period for adding new branch
>committers... making it 3 or 4 days would be fine.  After all, the work
>of branch committers is reviewed during the merge in any case.
>> > 
>> > What I am suggesting is reducing the administrative overheads of using a branch
to encourage use of branching.  
>> > Right now it feels like Apache’s process encourages committing directly
to trunk than a branch
>> > 
>> > Thanks
>> > Anu
>> It's a per project process. In slider, we've used a git flow: all work
>> goes in a feature branch, then merge in with a merge point. This gives a
>> better history of workflow, as an individual body of work is an ordered
>> sequence of operations, independent of everything else. This makes cherry
>> picking a sequence easier, it even makes unrolling a series of changes
>> easier: until the entire set of changes is committed, there is nothing to
>> back out.
>> 1. there's the rebase/merge problem: coping with conflicting change.
>> Rebasing helps, but makes team dev complex. And, if there are big
>> conflict changes, its often easier to take the current diff with trunk
>> branch and reapply it than try to rebase a sequence of operations. You
>> don't always need to rebase though; an FB can repeatedly merge in trunk,
>> for a history which may not be self contained, but does isolate the
>> feature dev from everyone else's work.
>> 2. Changes don't get exposed more broadly until the feature is in. That
>> may reduce review, but for those of us who work on downstream code it
>> means: nothing breaks until the complete feature is in. You may not
>> realise it, but those of us who do compile downstream things (slider,
>> spark) against even branch-2 always fear discovering what's just broken
>> at the API level alone. And that's "the stable branch". I haven't dared
>> build against trunk for a while.
>> 3. It's a real PITA trying to do development which spans >1 feature
>> branch. Even today it's tricky with code spanning >1 patch (HADOOP-13207
>> and HADOP-13208 this weekend). There I'm working in one branch and
>> generating two separate patches. That's hard to do in a single feature
>> branch.,
>> 4. The rules for feature branch merge. If I get a patch into trunk, it's
>> in the codebase. If I get it into a feature branch, there's the risk the
>> entire feature branch doesn't get in. Fix: for short lived feature
>> branches, we have an RTC policy strict enough we can say "if a feature
>> branch commit is in. it's considered good enough, even if a few more
>> successor commits are required before the whole sequence of commits are
>> considered stable.
>> 5. If you do lots of incremental patches (as feature branches encourage),
>> the patch history gets very noisy. Maybe here the patches can be rolled
>> up for the final commit. This is how Spark works.
>> 6. Jenkins doesn't test feature branches today. Can yetus do this if I
>> give a name of any branch? If so, for a feature branch of > 1w we could
>> just fork the trunk jenkins builds too, but have it only email the
>> committers.
>> 7. That final merge process needs to be rigorous from the regression
>> testing perspective. the last commit on a feature branch should be the
>> one to
>> Feature branches need to be short lived to cope with change well. And if
>> you are doing fundamental changes (e.g core APIs), there is some
>> incentive to get that common feature in, while you still get the full
>> implementation stable in a feature branch. But: you'd be better be
>> confident that the stuff in trunk isn't going to break. Nobody gets to
>> break the main build —or at least not for longer than it takes for the
>> merge to be reverted.
>> I think maybe we should try doing very-short-lived feature branches, with
>> a simple policy:
>> -self contained patch which delivers a complete feature/fix: single
>> patch. These are things where it means
>> -something which is an intermediate step to delivering something: part of
>> a feature branch. A branch where the process for committing patches is as
>> rigorous as for trunk —so there's no ambiguity about *whether* a
>> feature is merged in, only *when*
>To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
>For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org

View raw message