mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joern Kottmann <kottm...@gmail.com>
Subject Re: Apache MXNet build failures are mostly valid - verify before merge
Date Fri, 29 Sep 2017 09:20:50 GMT
It also makes sense to block too old PRs from merging, because the
test results are outdated and the build might fail after it gets
merged.

Jörn

On Thu, Sep 28, 2017 at 9:14 PM, Zha, Sheng <zhasheng@amazon.com> wrote:
> +1 on protected branch.
>
> Best regards,
> -sz
>
> On 9/28/17, 11:48 AM, "Kumar, Gautam" <gauta@amazon.com> wrote:
>
>     Hi Guys,
>
>      Let’s focus on specific issue here.
>
>     Marking the master branch protected which involves “Only merge if checks has passed,
and yes it will run the complete build”.
>
>     We can’t afford to degrade the quality and keep debugging the build failure forever.
If it’s slow down the development at the cost of quality I will vote for the quality.
>     We can work on improving the infrastructure to improve the overall speed.  If you
have any specific concerns on availability of Jenkins please point out.
>
>     -Gautam
>
>
>     On 9/28/17, 11:38 AM, "Chris Olivier" <cjolivier01@gmail.com> wrote:
>
>         -1000 on that. :)
>
>         On Thu, Sep 28, 2017 at 11:33 AM Naveen Swamy <mnnaveen@gmail.com> wrote:
>
>         > PR->Sanity test/Linux build/test->reviewer/committer approves the
>         > change->Comment "Build Now" (Or trigger on at least one approval from
a
>         > committer other than author)->*Full build-*>*passes build*->Enable
Merge
>         >
>         > Let us take this particular topic to a separate thread or discuss offline
>         > if further clarification is needed.
>         >
>         > On Thu, Sep 28, 2017 at 11:24 AM, Chris Olivier <cjolivier01@gmail.com>
>         > wrote:
>         >
>         > > I understand the proposal.  How to trigger a build in that case?
>         > >
>         > >
>         > > On Thu, Sep 28, 2017 at 10:54 AM Madan Jampani <madan.jampani@gmail.com>
>         > > wrote:
>         > >
>         > > > Chris,
>         > > > I don't think Naveen is suggesting that a merge happen without
full
>         > > > verification i.e. all tests across all platforms pass.
>         > > > If a PR has some back and forth and results in multiple revisions
>         > (which
>         > > is
>         > > > arguably more common than a random unit test failing), we simply
delay
>         > > full
>         > > > verification until the owner/reviewer have settled on a mutually
>         > > acceptable
>         > > > state.
>         > > >
>         > > > On Thu, Sep 28, 2017 at 10:25 AM, Chris Olivier <cjolivier01@gmail.com
>         > >
>         > > > wrote:
>         > > >
>         > > > > -1 for running only partial tests.  Most failing unit tests
that get
>         > > > > through fail only for certain platforms/configurations. 
I personally
>         > > > > prefer to be assured the build and test is good before merge.
 Most
>         > PR
>         > > > > merges aren't in a huge hurry.
>         > > > >
>         > > > > On Thu, Sep 28, 2017 at 9:54 AM, Naveen Swamy <mnnaveen@gmail.com>
>         > > > wrote:
>         > > > >
>         > > > > > +1 to make it protected. Here is what I am thinking
for PR builds
>         > > > > > on a PR run Sanity Tests + build/test one platform->committer
>         > reviews
>         > > > the
>         > > > > > code and issues "Build Now", a full build is run->Github
checks
>         > that
>         > > > the
>         > > > > > full build checks succeed before it can be merged.
>         > > > > >
>         > > > > > I agree with Madan that PR should be approved by one
another
>         > > committer.
>         > > > > >
>         > > > > >
>         > > > > >
>         > > > > > On Thu, Sep 28, 2017 at 9:37 AM, Madan Jampani <
>         > > > madan.jampani@gmail.com>
>         > > > > > wrote:
>         > > > > >
>         > > > > > > +1
>         > > > > > >
>         > > > > > > At a minimum I'd like to see the following two
happen:
>         > > > > > > - Option to merge is disabled until all required
checks pass.
>         > > > > > > - Code is reviewed and given +1 by at least one
other committer
>         > (no
>         > > > > self
>         > > > > > > review).
>         > > > > > >
>         > > > > > > On Wed, Sep 27, 2017 at 11:15 PM, Gautam <gautamnitc@gmail.com>
>         > > > wrote:
>         > > > > > >
>         > > > > > > > Hi Chris,
>         > > > > > > >
>         > > > > > > >   Here <https://help.github.com/articles/about-protected-
>         > > branches/
>         > > > >
>         > > > > is
>         > > > > > > > user
>         > > > > > > > document on semantics of protected branch.
>         > > > > > > > In short when a branch is protected following
applies to that
>         > > > branch.
>         > > > > > > >
>         > > > > > > >    - Can't be force pushed
>         > > > > > > >    - Can't be deleted
>         > > > > > > >    - Can't have changes merged into it until
required status
>         > > checks
>         > > > > > > >    <https://help.github.com/articles/about-required-
>         > > status-checks>
>         > > > > > pass
>         > > > > > > >    - Can't have changes merged into it until
required reviews
>         > are
>         > > > > > > approved
>         > > > > > > >    <https://help.github.com/articles/approving-a-pull-
>         > > > > > > > request-with-required-reviews>
>         > > > > > > >    - Can't be edited or have files uploaded
to it from the web
>         > > > > > > >    - Can't have changes merged into it until
changes to files
>         > > that
>         > > > > > > > have a designated
>         > > > > > > >    code owner <https://help.github.com/
>         > > articles/about-codeowners/>
>         > > > > > have
>         > > > > > > >    been approved by that owner
>         > > > > > > >
>         > > > > > > >  I am sure many of us might not want to have
all these but we
>         > can
>         > > > > > debate
>         > > > > > > on
>         > > > > > > > it. My main motive was to "*Can't have changes
merged into it
>         > > until
>         > > > > > > > required status checks pass*"
>         > > > > > > >
>         > > > > > > >
>         > > > > > > > -Gautam
>         > > > > > > >
>         > > > > > > >
>         > > > > > > >
>         > > > > > > > On Wed, Sep 27, 2017 at 11:09 PM, Chris Olivier
<
>         > > > > cjolivier01@gmail.com
>         > > > > > >
>         > > > > > > > wrote:
>         > > > > > > >
>         > > > > > > > > What does that mean? "Protected"? Protected
from what?
>         > > > > > > > >
>         > > > > > > > > On Wed, Sep 27, 2017 at 11:08 PM Gautam
<
>         > gautamnitc@gmail.com>
>         > > > > > wrote:
>         > > > > > > > >
>         > > > > > > > > > Hi Chris,
>         > > > > > > > > >
>         > > > > > > > > >    I mean make "master branch protected"
of  MXNet.
>         > > > > > > > > >
>         > > > > > > > > > -Gautam
>         > > > > > > > > >
>         > > > > > > > > > On Wed, Sep 27, 2017 at 11:04 PM,
Chris Olivier <
>         > > > > > > cjolivier01@gmail.com
>         > > > > > > > >
>         > > > > > > > > > wrote:
>         > > > > > > > > >
>         > > > > > > > > > > What does this mean? "Mx-net
branch protected"?
>         > > > > > > > > > >
>         > > > > > > > > > > On Wed, Sep 27, 2017 at 9:59
PM Tsuyoshi OZAWA <
>         > > > > > > > > ozawa.tsuyoshi@gmail.com
>         > > > > > > > > > >
>         > > > > > > > > > > wrote:
>         > > > > > > > > > >
>         > > > > > > > > > > > +1,
>         > > > > > > > > > > >
>         > > > > > > > > > > > While I'm checking the
recent build failures, and I
>         > think
>         > > > the
>         > > > > > > > > decision
>         > > > > > > > > > > > of making the mx-net branch
protected is necessary for
>         > > > stable
>         > > > > > > > > > > > building.
>         > > > > > > > > > > > Thanks Kumar for resuming
important discussion.
>         > > > > > > > > > > >
>         > > > > > > > > > > > Best regards
>         > > > > > > > > > > > - Tsuyoshi
>         > > > > > > > > > > >
>         > > > > > > > > > > > On Thu, Sep 28, 2017 at
12:56 PM, Kumar, Gautam <
>         > > > > > > gauta@amazon.com>
>         > > > > > > > > > > wrote:
>         > > > > > > > > > > > > Reviving the discussion.
>         > > > > > > > > > > > >
>         > > > > > > > > > > > > At this point of
time we have couple of stable builds
>         > > > > > > > > > > > >
>         > > > > > > > > > > >
>         > https://builds.apache.org/view/Incubator%20Projects/job/
>         > > > > > > > > > > incubator-mxnet/job/master/448/
>         > > > > > > > > > > > >
>         > > > > > > > > > > >
>         > https://builds.apache.org/view/Incubator%20Projects/job/
>         > > > > > > > > > > incubator-mxnet/job/master/449/
>         > > > > > > > > > > > >
>         > > > > > > > > > > > > Should we have a
quick discussion or polling on
>         > making
>         > > > the
>         > > > > > > mx-net
>         > > > > > > > > > > branch
>         > > > > > > > > > > > protected? If you still
think we shouldn’t make it
>         > > > protected
>         > > > > > > please
>         > > > > > > > > > > provide
>         > > > > > > > > > > > a reason to support your
claim.
>         > > > > > > > > > > > >
>         > > > > > > > > > > > > Few of us have concern
over Jenkin’s stability. If I
>         > > look
>         > > > > two
>         > > > > > > > weeks
>         > > > > > > > > > > > back, after upgrading
Linux slave to g2.8x and new
>         > > windows
>         > > > > AMI,
>         > > > > > > we
>         > > > > > > > > have
>         > > > > > > > > > > not
>         > > > > > > > > > > > seen any case where instance
died due to high memory
>         > > usage
>         > > > or
>         > > > > > any
>         > > > > > > > > > process
>         > > > > > > > > > > > got killed due to high
cpu usage or any other issue
>         > with
>         > > > > > windows
>         > > > > > > > > > slaves.
>         > > > > > > > > > > > >
>         > > > > > > > > > > > > Going forward we
are also planning that if we add any
>         > > new
>         > > > > > slave
>         > > > > > > > we
>         > > > > > > > > > will
>         > > > > > > > > > > > not enable the main load
immediately, but rather will
>         > do
>         > > > > ‘test
>         > > > > > > > build’
>         > > > > > > > > > to
>         > > > > > > > > > > > make sure that new slaves
are not causing any
>         > > > infrastructure
>         > > > > > > issue
>         > > > > > > > > and
>         > > > > > > > > > > > capable to perform as
good as existing slaves.
>         > > > > > > > > > > > >
>         > > > > > > > > > > > > -Gautam
>         > > > > > > > > > > > >
>         > > > > > > > > > > > > On 8/31/17, 5:27
PM, "Lupesko, Hagay" <
>         > > lupesko@gmail.com
>         > > > >
>         > > > > > > wrote:
>         > > > > > > > > > > > >
>         > > > > > > > > > > > >     @madan looking
into some failures – you’re right…
>         > > > > there’s
>         > > > > > > > > > multiple
>         > > > > > > > > > > > issues going on, some
of them intermittent, and we want
>         > > to
>         > > > be
>         > > > > > > able
>         > > > > > > > to
>         > > > > > > > > > > merge
>         > > > > > > > > > > > fixes in.
>         > > > > > > > > > > > >     Agreed that we
can wait with setting up protected
>         > > > mode
>         > > > > > > until
>         > > > > > > > > > build
>         > > > > > > > > > > > stabilizes.
>         > > > > > > > > > > > >
>         > > > > > > > > > > > >     On 8/31/17, 11:41,
"Madan Jampani" <
>         > > > > > > madan.jampani@gmail.com>
>         > > > > > > > > > > wrote:
>         > > > > > > > > > > > >
>         > > > > > > > > > > > >         @hagay: we
agree on the end state. I'm not
>         > too
>         > > > > > > particular
>         > > > > > > > > > about
>         > > > > > > > > > > > how we get
>         > > > > > > > > > > > >         there. If
you think enabling it now and fixes
>         > > > > > > regression
>         > > > > > > > > > later
>         > > > > > > > > > > > is doable,
>         > > > > > > > > > > > >         I'm fine
with. I see a bit of a chicken and
>         > egg
>         > > > > > > problem.
>         > > > > > > > We
>         > > > > > > > > > > need
>         > > > > > > > > > > > to get
>         > > > > > > > > > > > >         some fixes
in even when the status checks are
>         > > > > > failing.
>         > > > > > > > > > > > >
>         > > > > > > > > > > > >         On Thu, Aug
31, 2017 at 11:25 AM, Lupesko,
>         > > Hagay
>         > > > <
>         > > > > > > > > > > > lupesko@gmail.com>
wrote:
>         > > > > > > > > > > > >
>         > > > > > > > > > > > >         > @madan
– re: getting to a stable CI first:
>         > > > > > > > > > > > >         > I’m
concerned that by not enabling
>         > protected
>         > > > > branch
>         > > > > > > > mode
>         > > > > > > > > > > ASAP,
>         > > > > > > > > > > > we’re just
>         > > > > > > > > > > > >         > taking
in more regressions, which makes a
>         > > > stable
>         > > > > > > build
>         > > > > > > > a
>         > > > > > > > > > > > moving target for
>         > > > > > > > > > > > >         > us…
>         > > > > > > > > > > > >         >
>         > > > > > > > > > > > >         > On 8/31/17,
10:49, "Zha, Sheng" <
>         > > > > > zhasheng@amazon.com
>         > > > > > > >
>         > > > > > > > > > wrote:
>         > > > > > > > > > > > >         >
>         > > > > > > > > > > > >         >    
Just one thing: please don’t disable
>         > more
>         > > > > tests
>         > > > > > > or
>         > > > > > > > > just
>         > > > > > > > > > > > raise the
>         > > > > > > > > > > > >         > tolerance
thresholds.
>         > > > > > > > > > > > >         >
>         > > > > > > > > > > > >         >    
Best regards,
>         > > > > > > > > > > > >         >    
-sz
>         > > > > > > > > > > > >         >
>         > > > > > > > > > > > >         >    
On 8/31/17, 10:45 AM, "Madan Jampani" <
>         > > > > > > > > > > > madan.jampani@gmail.com>
wrote:
>         > > > > > > > > > > > >         >
>         > > > > > > > > > > > >         >    
    +1
>         > > > > > > > > > > > >         >    
    Before we can turn protected mode I
>         > > > feel
>         > > > > we
>         > > > > > > > > should
>         > > > > > > > > > > > first get to a
>         > > > > > > > > > > > >         > stable
CI
>         > > > > > > > > > > > >         >    
    pipeline.
>         > > > > > > > > > > > >         >    
    Sandeep is chasing down known
>         > > breaking
>         > > > > > > issues.
>         > > > > > > > > > > > >         >
>         > > > > > > > > > > > >         >
>         > > > > > > > > > > > >         >    
    On Thu, Aug 31, 2017 at 10:27 AM,
>         > > Hagay
>         > > > > > > > Lupesko <
>         > > > > > > > > > > > lupesko@gmail.com>
>         > > > > > > > > > > > >         > wrote:
>         > > > > > > > > > > > >         >
>         > > > > > > > > > > > >         >    
    > Build stability is a major issue,
>         > > > > builds
>         > > > > > > have
>         > > > > > > > > > been
>         > > > > > > > > > > > failing left
>         > > > > > > > > > > > >         > and
right
>         > > > > > > > > > > > >         >    
    > over the last week. Some of it is
>         > > due
>         > > > > to
>         > > > > > > > > Jenkins
>         > > > > > > > > > > > slave issues,
>         > > > > > > > > > > > >         > but
some are
>         > > > > > > > > > > > >         >    
    > real regressions.
>         > > > > > > > > > > > >         >    
    > We need to be more strict in the
>         > > code
>         > > > > > we're
>         > > > > > > > > > > > committing.
>         > > > > > > > > > > > >         >    
    >
>         > > > > > > > > > > > >         >    
    > I propose we configure our master
>         > > to
>         > > > > be a
>         > > > > > > > > > protected
>         > > > > > > > > > > > branch (
>         > > > > > > > > > > > >         >    
    >
>         > > > > > > > > > > > https://help.github.com/articles/about-protected-
>         > > branches/
>         > > > ).
>         > > > > > > > > > > > >         >    
    >
>         > > > > > > > > > > > >         >    
    > Thoughts?
>         > > > > > > > > > > > >         >    
    >
>         > > > > > > > > > > > >         >    
    > On 2017-08-28 22:41, sandeep
>         > > > > > krishnamurthy
>         > > > > > > <
>         > > > > > > > > > > > s...@gmail.com>
>         > > > > > > > > > > > >         > wrote:
>         > > > > > > > > > > > >         >    
    > > Hello Committers and
>         > > Contributors,>
>         > > > > > > > > > > > >         >    
    > >
>         > > > > > > > > > > > >         >    
    > > Due to unstable build
>         > pipelines,
>         > > > from
>         > > > > > > past
>         > > > > > > > 1
>         > > > > > > > > > > week,
>         > > > > > > > > > > > PRs are
>         > > > > > > > > > > > >         > being
merged>
>         > > > > > > > > > > > >         >    
    > > after CR ignoring PR build
>         > > status.
>         > > > > > Build
>         > > > > > > > > > pipeline
>         > > > > > > > > > > > is much more
>         > > > > > > > > > > > >         > stable
>         > > > > > > > > > > > >         >    
    > than>
>         > > > > > > > > > > > >         >    
    > > last week and most of the build
>         > > > > > failures
>         > > > > > > > you
>         > > > > > > > > > see
>         > > > > > > > > > > > from now on,
>         > > > > > > > > > > > >         > are
likely
>         > > > > > > > > > > > >         >    
    > to>
>         > > > > > > > > > > > >         >    
    > > be a valid failure and hence,
>         > it
>         > > is
>         > > > > > > > > recommended
>         > > > > > > > > > > to
>         > > > > > > > > > > > wait for PR
>         > > > > > > > > > > > >         > builds,
>         > > > > > > > > > > > >         >    
    > see>
>         > > > > > > > > > > > >         >    
    > > the root cause of any build
>         > > > failures
>         > > > > > > before
>         > > > > > > > > > > > proceeding with
>         > > > > > > > > > > > >         > merges.>
>         > > > > > > > > > > > >         >    
    > >
>         > > > > > > > > > > > >         >    
    > > At this point of time, there
>         > are
>         > > 2
>         > > > > > > > > intermittent
>         > > > > > > > > > > > issue yet to
>         > > > > > > > > > > > >         > be fixed
->
>         > > > > > > > > > > > >         >    
    > > * Network error leading to
>         > GitHub
>         > > > > > > requests
>         > > > > > > > > > > > throwing 404>
>         > > > > > > > > > > > >         >    
    > > * A conflict in artifacts
>         > > generated
>         > > > > > > between
>         > > > > > > > > > > > branches/PR -
>         > > > > > > > > > > > >         > Cause
unknown
>         > > > > > > > > > > > >         >    
    > yet.>
>         > > > > > > > > > > > >         >    
    > > These issues will be fixed
>         > soon.>
>         > > > > > > > > > > > >         >    
    > >
>         > > > > > > > > > > > >         >    
    > >
>         > > > > > > > > > > > >         >    
    > > -- >
>         > > > > > > > > > > > >         >    
    > > Sandeep Krishnamurthy>
>         > > > > > > > > > > > >         >    
    > >
>         > > > > > > > > > > > >         >    
    >
>         > > > > > > > > > > > >         >
>         > > > > > > > > > > > >         >
>         > > > > > > > > > > > >         >
>         > > > > > > > > > > > >         >
>         > > > > > > > > > > > >         >
>         > > > > > > > > > > > >         >
>         > > > > > > > > > > > >
>         > > > > > > > > > > > >
>         > > > > > > > > > > > >
>         > > > > > > > > > > > >
>         > > > > > > > > > > > >
>         > > > > > > > > > > >
>         > > > > > > > > > > >
>         > > > > > > > > > > >
>         > > > > > > > > > > > --
>         > > > > > > > > > > > - Tsuyoshi
>         > > > > > > > > > > >
>         > > > > > > > > > >
>         > > > > > > > > >
>         > > > > > > > > >
>         > > > > > > > > >
>         > > > > > > > > > --
>         > > > > > > > > > Best Regards,
>         > > > > > > > > > Gautam Kumar
>         > > > > > > > > >
>         > > > > > > > >
>         > > > > > > >
>         > > > > > > >
>         > > > > > > >
>         > > > > > > > --
>         > > > > > > > Best Regards,
>         > > > > > > > Gautam Kumar
>         > > > > > > >
>         > > > > > >
>         > > > > >
>         > > > >
>         > > >
>         > >
>         >
>
>
>
>

Mime
View raw message