mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <>
Subject Re: Proposal for changing Mahout's Git branching rules
Date Wed, 21 Jun 2017 21:17:45 GMT
Since merges are done by committers, it’s easy to retarget a contributor’s PRs but committers
would PR against develop, and some projects like PredictionIO make develop the default branch
on github so it's the one contributors get by default.

In fact this is the primary difference, Master is left stable and ignored until a release
or bug fix is needed before the next release.

We already have various branches and now that we can clean them up without involving Infra,
the rest of your question is resolved by the originator of the change just like today.

I see the key benefits as:
1) as I’ve already over stated, master is stable
2) we have a documented process that is IMO a “best practice”. Even if we stick with the
process of today we need to document it as release artifacts and branches proliferate.

On Jun 21, 2017, at 2:06 PM, Dmitriy Lyubimov <> wrote:

so people need to make sure their PR merges to develop instead of master?
Do they need to PR against develop branch, and if not, who is responsible
for confict resolution then that is to arise from diffing and merging into
different targets?

On Tue, Jun 20, 2017 at 10:09 AM, Pat Ferrel <> wrote:

> As I said I was sure there would be Jenkins issues but they must be small
> since it’s just renaming of target branches. Releases are still made from
> master so I don’t see the issue there at all. Only intermediate CI tasks
> are triggered on other branches. But they would have to be in your examples
> too so I don’t see the benefit of using an ad hoc method in terms of CI.
> We’ve used this method for years with Apache PredictionIO with minimal CI
> issues.
> No the process below is not equivalent, treating master as develop removes
> the primary (in my mind) benefit. In git flow the master is always stable
> and the reflection of the last primary/core/default release with only
> critical inter-release fixes. If someone wants to work with stable
> up-to-date source, where do they go with the current process? I would claim
> that there actually may be no place to find such a thing except by tracking
> down some working commit number. It would depend on what stage the project
> is in, in git flow there is never a question—master is always stable. Git
> flow also accounts for all the process exceptions and complexities you
> mention below but in a standardized way that is documented so anyone can
> read the rules and follow them. We/Mahout doesn’t even have to write them,
> they can just be referenced.
> But we are re-arguing something I thought was already voted on and that is
> another issue. If we need to re-debate this let’s make it stick one way or
> the other.
> I really appreciate you being release master and the thought and work
> you’ve put into this and if we decide to stick with it, fine. But it should
> be a project decision that release masters follow, not up to each release
> master. We are now embarking on a much more complex release than before
> with multiple combinations of dependencies for binaries and so multiple
> artifacts. We need to make the effort tame the complexity somehow or it
> will just multiply.
> Given the short nature of the current point release I’d even suggest that
> we target putting our decision in practice after the release, which is a
> better time to make a change if we are to do so.
> On Jun 19, 2017, at 9:04 PM, Trevor Grant <>
> wrote:
> First issue, one does not simply just start using a develop branch.  CI
> only triggers off the 'main' branch, which is master by default.  If we
> move to the way you propose, then we need to file a ticket with INFRA I
> believe.  That can be done, but its not like we just start doing it one
> day.
> The current method is, when we cut a release- we make a new branch of that
> release. Master is treated like dev. If you want the latest stable, you
> would check out branch-0.13.0 .  This is the way most major projects
> (citing Spark, Flink, Zeppelin), including Mahout up to version 0.10.x
> worked.  To your point, there being a lack of a recent stable- that's fair,
> but partly that's because no one created branches with the release for
> 0.10.? - 0.12.2.
> For all intents and purposes, we are (now once again) following what you
> propose, the only difference is we are treating master as dev, and
> "branch-0.13.0" as master (e.g. last stable).  Larger features go on their
> own branch until they are ready to merge- e.g. ATM there is just one
> feature branch CUDA.  That was the big take away from this discussion last
> time- there needed to be feature branches, as opposed to everyone running
> around either working off WIP PRs or half baked merges, etc.  To that end-
> "website" was a feature branch, and iirc there has been one other feature
> branch that has merged in the last couple of months but I forget what it
> was at the moment.
> Trevor Grant
> Data Scientist
> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
> On Mon, Jun 19, 2017 at 8:02 PM, Pat Ferrel <> wrote:
>> Perhaps there is a misunderstanding about where a release comes
>> from—master. So any release tools we have should work fine. It’s just
> that
>> until you are ready to pull the trigger, development is in develop or
> more
>> strictly a “getting a release ready” branch called a release branch. This
>> sounds like a lot of branches but in practice it’s trivial to merge and
>> purge. Everything stays clean and rapid fire last minute fixes are
> isolated
>> to the release branch before going into master.
>> The original reason I brought this up is that our Git tools now allow
>> committers to delete old cruft laden branches that are created and
>> ephemeral with this method.
>> On Jun 19, 2017, at 5:52 PM, Pat Ferrel <> wrote:
>> I just heard we are not using git flow (the process not the tool), we are
>> checking unclean (untested in any significant way) changes to master?
> What
>> is the develop branch used for?
>> The master is unstable most all the time with the old method, in fact
>> there is *no stable bundle of source ever* without git flow. With git
> flow
>> you can peel off a bug fix and merge with master and users can pull it
>> expecting that everything else is stable and like the last build. This
> has
>> bit me with Mahout in the past as I’m sure it has for everyone. This
>> doesn’t fix that but it does limit the pain to committers.
>> If we aren’t going to use it, fine but let’s not agree to it then do
>> something else. If it’s a matter of timing ok, I understood from Andrew’s
>> mail below there was no timing issue but I expect there will be Jenkins
> or
>> Travis issues to iron out.
>> For reference: <
>>> I have never
>> heard of someone who has tried it that didn’t like it but it takes a leap
>> of faith unless you have git in your bones.
>> On Apr 22, 2017, at 10:42 AM, Andrew Musselman <
>> wrote:
>> Okay develop it is; I'll cut a develop branch from master right now.
>> As we go, if people forget and push to master, we can merge those changes
>> into develop.
>> In addition, I'm making a 'website' branch for all work on the new
> version
>> of the site.
>> On Sat, Apr 22, 2017 at 10:36 AM, Pat Ferrel <>
>> wrote:
>>> There are tools to implement git-flow that I haven’t used and may have
>>> some standardization built in but I think “develop” is typical and safe.
>>> On Apr 22, 2017, at 10:33 AM, Andrew Musselman <
>>> wrote:
>>> Cool, I'll make a new dev branch now.
>>> Dev, develop, any preference?
>>> On Sat, Apr 22, 2017 at 10:30 AM, Pat Ferrel <>
>>> wrote:
>>>> It hasn't been often but I’ve been bit by it and had to ask users of a
>>>> dependent project to checkout a specific commit, nasty.
>>>> The main affect would be to automation efforts that are currently wip.
>>>> On Apr 22, 2017, at 10:25 AM, Andrew Musselman <
>>>> wrote:
>>>> I've worked in shops where that was the standard flow, in hg or git,
> and
>>> it
>>>> worked great. I'm in favor of it especially as we add contributors and
>>> make
>>>> it easier for people to submit new work.
>>>> Have we had that many times when master got messed up? I don't recall
>>> more
>>>> than a few, but in any case the master/dev branch approach is solid.
>>>> On Sat, Apr 22, 2017 at 10:06 AM, Pat Ferrel <>
>>>> wrote:
>>>>> I’ve been introduced to what is now being called git-flow, which at
>> it’s
>>>>> simplest is just a branching strategy with several key benefits. The
>>> most
>>>>> important part of it is that the master branch is rock solid all the
>>> time
>>>>> because we use the “develop” branch for integrating Jiras, PRs,
>>> features,
>>>>> etc. Any “rock solid” bit can be cherry-picked and put into master
>>>>> hot-fixes that fix a release but still require a source build.
>>>>> Key features of git-flow:
>>>>> The master becomes stable and can be relied on to be stable. It is
>>>>> generally equal to the last release with only stable or required
>>>> exceptions.
>>>>> Develop is where all the integration and potentially risky work
>> happens.
>>>>> It is where most PRs are targeted.
>>>>> A release causes develop to be merged with master and so it maintains
>>> the
>>>>> stability of master.
>>>>> The benefits of git-flow are more numerous but also seem scary because
>>>> the
>>>>> explanation can be complex. I’ve switched all my projects and Apache
>>>>> PredictionIO is where I was introduced to this, and it is actually
>> quite
>>>>> easy to manage and collaborate with this model. We just need to take
>> the
>>>>> plunge by creating a persistent branch in the Apache git repo called
>>>>> “develop”. From then on all commits will go to “develop” and
all PRs
>>>> should
>>>>> be created against it. Just after a release is a good time for this.
>>>>> <
>>>>> What say you all?

View raw message