hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gangumalla, Uma" <uma.ganguma...@intel.com>
Subject Re: [DISCUSS] Increased use of feature branches
Date Mon, 13 Jun 2016 20:08:04 GMT

On 6/13/16, 12:41 PM, "Anu Engineer" <aengineer@hortonworks.com> wrote:

>Hi Colin,
>>Even if everyone used branches for all development, person X might merge
>>their branch before person Y, forcing person Y to do a rebase or merge.
>>It is not the presence of absence of branches that causes the need to
>>merge or rebase, but the presence of absence of "churn."
>You are perfectly right on this technically. The issue is when a
>branch developer gets caught in Commit, Revert, let-us-commit-again,
>oh-it-is-not-fixed-completely, let-us-revert-the-revert cycle.
>I was hoping that branches will be exposed to less of this if everyone
>had private branches and got some time to test and bake the feature
>instead of just directly committing to trunk and then test.
>Once again, I agree with your point that in a perfect world, merges should
>be about the churn, but trunk is often treated as development branch,
>So my point is that it gets unnecessary churn. I really appreciate the
>thought in the thread - that is - let us be more responsible about how we
>treat trunk.
>> I thought the feature branch merge voting period had been shortened to 5
>>days rather than 7?  We should probably spell this out on
>Thanks for the link, right now it says 7 days. That is why I assumed it
>is 7. 
>Would you be kind enough to point me to a thread that says it is 5 days
>for a merge Vote? 
>I did a google search, but was not able to find a thread like that.
>Thanks in advance.
I remember 5days voting was related to release. Not sure that time we
discussed about branch merge voting time.
Here is the link: 
>On 6/13/16, 11:51 AM, "Colin McCabe" <cmccabe@apache.org> wrote:
>>On Sun, Jun 12, 2016, at 05:06, Steve Loughran wrote:
>>> > On 10 Jun 2016, at 20:37, Anu Engineer <aengineer@hortonworks.com>
>>> > 
>>> > I actively work on two branches (Diskbalancer and ozone) and I agree
>>>with most of what Sangjin said.
>>> > There is an overhead in working with branches, there are both
>>>technical costs and administrative issues
>>> > which discourages developers from using branches.
>>> > 
>>> > I think the biggest issue with branch based development is that fact
>>>that other developers do not use a branch.
>>> > If a small feature appears as a series of commits to
>>>“”datanode.java””, the branch based developer ends up rebasing
>>> > and paying this price of rebasing many times. If everyone followed a
>>>model of branch + Pull request, other branches
>>> > would not have to deal with continues rebasing to trunk commits. If
>>>we are moving to a branch based
>>Even if everyone used branches for all development, person X might merge
>>their branch before person Y, forcing person Y to do a rebase or merge.
>>It is not the presence of absence of branches that causes the need to
>>merge or rebase, but the presence of absence of "churn."
>>We try to minimize "churn" in many ways.  For example, we discourage
>>people from making trivial whitespace changes to parts of the code
>>they're not modifying in their patch.  Or doing things like letting
>>their editor change the line ending of files from LF to CR/LF.  However,
>>in the final analysis, churn will always exist because development
>>> > development, we should probably move to that model for most
>>>development to avoid this tax on people who
>>> > actually end up working in the branches.
>>> > 
>>> > I do have a question in my mind though: What is being proposed is
>>>that we move active development to branches
>>> > if the feature is small or incomplete, however keep the trunk open
>>>for check-ins. One of the biggest reason why we
>>> > check-in into trunk and not to branch-2 is because it is a change
>>>that will break backward compatibility. So do we
>>> > have an expectation of backward compatibility thru the 3.0-alpha
>>>series (I personally vote No, since 3.0 is experimental
>>> > at this stage), but if we decide to support some sort of
>>>backward-compact then willy-nilly committing to trunk
>>> > and still maintaining the expectation we can release Alphas from 3.0
>>>does not look possible.
>>> > 
>>> > And then comes the question, once 3.0 becomes official, where do we
>>>check-in a change,  if that would break something?
>>> > so this will lead us back to trunk being the unstable – 3.0 being
>>>the new “branch-2”.
>>I'm not sure I really understand the goal of the "trunk-incompat"
>>proposal.  Like Karthik asked earlier in this thread, isn't it really
>>just a rename of the existing trunk branch?
>>It sounds like the policy is going to be exactly the same as now:
>>incompatible stuff in trunk/trunk-incompat/whatever, 3.x compatible
>>changes in the 3.x line, 2.x compatible changes in the 2.x line, etc.
>>I think we should just create branch-3 and follow the same policy we
>>followed with branch-2 and branch-1.  Switching around the names doesn't
>>really change the policy, and it creates confusion since it's
>>inconsistent with what we did earlier.
>>I think one of the big frustrations with trunk is that features sat
>>there a while without being released because they weren't compatible
>>with branch-2-- the shell script rewrite, for example.  However, this
>>reflects a fundamental tradeoff-- either incompatible features can't be
>>developed at all in the lifetime of Hadoop 3.x, or we will need
>>somewhere to put them.  The trunk-incompat proposal is like saying that
>>you've solved the prison overcrowding problem by renaming all prisons to
>>"correctional facilities."
>>> > 
>>> > One more point: If we are moving to use a branch always – then we
>>>are looking at a model similar to using a git + pull
>>> > request model. If that is so would it make sense to modify the rules
>>>to make these branches easier to merge?
>>> > Say for example, if all commits in a branch has followed review and
>>>checking policy – just like trunk and commits
>>> > have been made only after a sign off from a committer, would it be
>>>possible to merge with a 3-day voting period
>>> > instead of 7, or treat it just like today’s commit to trunk –
>>>but with 2 people signing-off?
>>I thought the feature branch merge voting period had been shortened to 5
>>days rather than 7?  We should probably spell this out on
>>https://hadoop.apache.org/bylaws.html .  Like I said above, I don't
>>believe that *all* development should be on feature branches, just
>>biggish stuff that is likely to be controversial and/or disruptive.  The
>>suggestion I made earlier is that if 3 people ask you for a branch, you
>>should definitely strongly consider a branch.
>>I do think we should shorten the voting period for adding new branch
>>committers... making it 3 or 4 days would be fine.  After all, the work
>>of branch committers is reviewed during the merge in any case.
>>> > 
>>> > What I am suggesting is reducing the administrative overheads of
>>>using a branch to encourage use of branching.
>>> > Right now it feels like Apache’s process encourages committing
>>>directly to trunk than a branch
>>> > 
>>> > Thanks
>>> > Anu
>>> It's a per project process. In slider, we've used a git flow: all work
>>> goes in a feature branch, then merge in with a merge point. This gives
>>> better history of workflow, as an individual body of work is an ordered
>>> sequence of operations, independent of everything else. This makes
>>> picking a sequence easier, it even makes unrolling a series of changes
>>> easier: until the entire set of changes is committed, there is nothing
>>> back out.
>>> 1. there's the rebase/merge problem: coping with conflicting change.
>>> Rebasing helps, but makes team dev complex. And, if there are big
>>> conflict changes, its often easier to take the current diff with trunk
>>> branch and reapply it than try to rebase a sequence of operations. You
>>> don't always need to rebase though; an FB can repeatedly merge in
>>> for a history which may not be self contained, but does isolate the
>>> feature dev from everyone else's work.
>>> 2. Changes don't get exposed more broadly until the feature is in. That
>>> may reduce review, but for those of us who work on downstream code it
>>> means: nothing breaks until the complete feature is in. You may not
>>> realise it, but those of us who do compile downstream things (slider,
>>> spark) against even branch-2 always fear discovering what's just broken
>>> at the API level alone. And that's "the stable branch". I haven't dared
>>> build against trunk for a while.
>>> 3. It's a real PITA trying to do development which spans >1 feature
>>> branch. Even today it's tricky with code spanning >1 patch
>>> and HADOP-13208 this weekend). There I'm working in one branch and
>>> generating two separate patches. That's hard to do in a single feature
>>> branch.,
>>> 4. The rules for feature branch merge. If I get a patch into trunk,
>>> in the codebase. If I get it into a feature branch, there's the risk
>>> entire feature branch doesn't get in. Fix: for short lived feature
>>> branches, we have an RTC policy strict enough we can say "if a feature
>>> branch commit is in. it's considered good enough, even if a few more
>>> successor commits are required before the whole sequence of commits are
>>> considered stable.
>>> 5. If you do lots of incremental patches (as feature branches
>>> the patch history gets very noisy. Maybe here the patches can be rolled
>>> up for the final commit. This is how Spark works.
>>> 6. Jenkins doesn't test feature branches today. Can yetus do this if I
>>> give a name of any branch? If so, for a feature branch of > 1w we could
>>> just fork the trunk jenkins builds too, but have it only email the
>>> committers.
>>> 7. That final merge process needs to be rigorous from the regression
>>> testing perspective. the last commit on a feature branch should be the
>>> one to
>>> Feature branches need to be short lived to cope with change well. And
>>> you are doing fundamental changes (e.g core APIs), there is some
>>> incentive to get that common feature in, while you still get the full
>>> implementation stable in a feature branch. But: you'd be better be
>>> confident that the stuff in trunk isn't going to break. Nobody gets to
>>> break the main build —or at least not for longer than it takes for
>>> merge to be reverted.
>>> I think maybe we should try doing very-short-lived feature branches,
>>> a simple policy:
>>> -self contained patch which delivers a complete feature/fix: single
>>> patch. These are things where it means
>>> -something which is an intermediate step to delivering something: part
>>> a feature branch. A branch where the process for committing patches is
>>> rigorous as for trunk —so there's no ambiguity about *whether* a
>>> feature is merged in, only *when*
>>To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
>>For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org

To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org

View raw message