hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colin McCabe <cmcc...@apache.org>
Subject Re: [DISCUSS] Increased use of feature branches
Date Mon, 13 Jun 2016 18:51:49 GMT
On Sun, Jun 12, 2016, at 05:06, Steve Loughran wrote:
> > On 10 Jun 2016, at 20:37, Anu Engineer <aengineer@hortonworks.com> wrote:
> > 
> > I actively work on two branches (Diskbalancer and ozone) and I agree with most of
what Sangjin said. 
> > There is an overhead in working with branches, there are both technical costs and
administrative issues 
> > which discourages developers from using branches.
> > 
> > I think the biggest issue with branch based development is that fact that other
developers do not use a branch.
> > If a small feature appears as a series of commits to “”datanode.java””,
the branch based developer ends up rebasing 
> > and paying this price of rebasing many times. If everyone followed a model of branch
+ Pull request, other branches
> > would not have to deal with continues rebasing to trunk commits. If we are moving
to a branch based 

Even if everyone used branches for all development, person X might merge
their branch before person Y, forcing person Y to do a rebase or merge. 
It is not the presence of absence of branches that causes the need to
merge or rebase, but the presence of absence of "churn."

We try to minimize "churn" in many ways.  For example, we discourage
people from making trivial whitespace changes to parts of the code
they're not modifying in their patch.  Or doing things like letting
their editor change the line ending of files from LF to CR/LF.  However,
in the final analysis, churn will always exist because development

> > development, we should probably move to that model for most development to avoid
this tax on people who
> > actually end up working in the branches.
> > 
> > I do have a question in my mind though: What is being proposed is that we move active
development to branches 
> > if the feature is small or incomplete, however keep the trunk open for check-ins.
One of the biggest reason why we 
> > check-in into trunk and not to branch-2 is because it is a change that will break
backward compatibility. So do we 
> > have an expectation of backward compatibility thru the 3.0-alpha series (I personally
vote No, since 3.0 is experimental 
> > at this stage), but if we decide to support some sort of backward-compact then willy-nilly
committing to trunk 
> > and still maintaining the expectation we can release Alphas from 3.0 does not look
> > 
> > And then comes the question, once 3.0 becomes official, where do we check-in a change,
 if that would break something? 
> > so this will lead us back to trunk being the unstable – 3.0 being the new “branch-2”.

I'm not sure I really understand the goal of the "trunk-incompat"
proposal.  Like Karthik asked earlier in this thread, isn't it really
just a rename of the existing trunk branch?
It sounds like the policy is going to be exactly the same as now:
incompatible stuff in trunk/trunk-incompat/whatever, 3.x compatible
changes in the 3.x line, 2.x compatible changes in the 2.x line, etc.

I think we should just create branch-3 and follow the same policy we
followed with branch-2 and branch-1.  Switching around the names doesn't
really change the policy, and it creates confusion since it's
inconsistent with what we did earlier.

I think one of the big frustrations with trunk is that features sat
there a while without being released because they weren't compatible
with branch-2-- the shell script rewrite, for example.  However, this
reflects a fundamental tradeoff-- either incompatible features can't be
developed at all in the lifetime of Hadoop 3.x, or we will need
somewhere to put them.  The trunk-incompat proposal is like saying that
you've solved the prison overcrowding problem by renaming all prisons to
"correctional facilities."

> > 
> > One more point: If we are moving to use a branch always – then we are looking
at a model similar to using a git + pull 
> > request model. If that is so would it make sense to modify the rules to make these
branches easier to merge?
> > Say for example, if all commits in a branch has followed review and checking policy
– just like trunk and commits 
> > have been made only after a sign off from a committer, would it be possible to merge
with a 3-day voting period 
> > instead of 7, or treat it just like today’s commit to trunk – but with
2 people signing-off? 

I thought the feature branch merge voting period had been shortened to 5
days rather than 7?  We should probably spell this out on
https://hadoop.apache.org/bylaws.html .  Like I said above, I don't
believe that *all* development should be on feature branches, just
biggish stuff that is likely to be controversial and/or disruptive.  The
suggestion I made earlier is that if 3 people ask you for a branch, you
should definitely strongly consider a branch.

I do think we should shorten the voting period for adding new branch
committers... making it 3 or 4 days would be fine.  After all, the work
of branch committers is reviewed during the merge in any case.


> > 
> > What I am suggesting is reducing the administrative overheads of using a branch
to encourage use of branching.  
> > Right now it feels like Apache’s process encourages committing directly to
trunk than a branch
> > 
> > Thanks
> > Anu
> It's a per project process. In slider, we've used a git flow: all work
> goes in a feature branch, then merge in with a merge point. This gives a
> better history of workflow, as an individual body of work is an ordered
> sequence of operations, independent of everything else. This makes cherry
> picking a sequence easier, it even makes unrolling a series of changes
> easier: until the entire set of changes is committed, there is nothing to
> back out.
> 1. there's the rebase/merge problem: coping with conflicting change.
> Rebasing helps, but makes team dev complex. And, if there are big
> conflict changes, its often easier to take the current diff with trunk
> branch and reapply it than try to rebase a sequence of operations. You
> don't always need to rebase though; an FB can repeatedly merge in trunk,
> for a history which may not be self contained, but does isolate the
> feature dev from everyone else's work.
> 2. Changes don't get exposed more broadly until the feature is in. That
> may reduce review, but for those of us who work on downstream code it
> means: nothing breaks until the complete feature is in. You may not
> realise it, but those of us who do compile downstream things (slider,
> spark) against even branch-2 always fear discovering what's just broken
> at the API level alone. And that's "the stable branch". I haven't dared
> build against trunk for a while.
> 3. It's a real PITA trying to do development which spans >1 feature
> branch. Even today it's tricky with code spanning >1 patch (HADOOP-13207
> and HADOP-13208 this weekend). There I'm working in one branch and
> generating two separate patches. That's hard to do in a single feature
> branch.,
> 4. The rules for feature branch merge. If I get a patch into trunk, it's
> in the codebase. If I get it into a feature branch, there's the risk the
> entire feature branch doesn't get in. Fix: for short lived feature
> branches, we have an RTC policy strict enough we can say "if a feature
> branch commit is in. it's considered good enough, even if a few more
> successor commits are required before the whole sequence of commits are
> considered stable.
> 5. If you do lots of incremental patches (as feature branches encourage),
> the patch history gets very noisy. Maybe here the patches can be rolled
> up for the final commit. This is how Spark works.
> 6. Jenkins doesn't test feature branches today. Can yetus do this if I
> give a name of any branch? If so, for a feature branch of > 1w we could
> just fork the trunk jenkins builds too, but have it only email the
> committers.
> 7. That final merge process needs to be rigorous from the regression
> testing perspective. the last commit on a feature branch should be the
> one to
> Feature branches need to be short lived to cope with change well. And if
> you are doing fundamental changes (e.g core APIs), there is some
> incentive to get that common feature in, while you still get the full
> implementation stable in a feature branch. But: you'd be better be
> confident that the stuff in trunk isn't going to break. Nobody gets to
> break the main build —or at least not for longer than it takes for the
> merge to be reverted.
> I think maybe we should try doing very-short-lived feature branches, with
> a simple policy:
> -self contained patch which delivers a complete feature/fix: single
> patch. These are things where it means
> -something which is an intermediate step to delivering something: part of
> a feature branch. A branch where the process for committing patches is as
> rigorous as for trunk —so there's no ambiguity about *whether* a
> feature is merged in, only *when*

To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org

View raw message