mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Olivier <cjolivie...@gmail.com>
Subject Re: Apache MXNet build failures are mostly valid - verify before merge
Date Thu, 28 Sep 2017 18:37:34 GMT
-1000 on that. :)

On Thu, Sep 28, 2017 at 11:33 AM Naveen Swamy <mnnaveen@gmail.com> wrote:

> PR->Sanity test/Linux build/test->reviewer/committer approves the
> change->Comment "Build Now" (Or trigger on at least one approval from a
> committer other than author)->*Full build-*>*passes build*->Enable Merge
>
> Let us take this particular topic to a separate thread or discuss offline
> if further clarification is needed.
>
> On Thu, Sep 28, 2017 at 11:24 AM, Chris Olivier <cjolivier01@gmail.com>
> wrote:
>
> > I understand the proposal.  How to trigger a build in that case?
> >
> >
> > On Thu, Sep 28, 2017 at 10:54 AM Madan Jampani <madan.jampani@gmail.com>
> > wrote:
> >
> > > Chris,
> > > I don't think Naveen is suggesting that a merge happen without full
> > > verification i.e. all tests across all platforms pass.
> > > If a PR has some back and forth and results in multiple revisions
> (which
> > is
> > > arguably more common than a random unit test failing), we simply delay
> > full
> > > verification until the owner/reviewer have settled on a mutually
> > acceptable
> > > state.
> > >
> > > On Thu, Sep 28, 2017 at 10:25 AM, Chris Olivier <cjolivier01@gmail.com
> >
> > > wrote:
> > >
> > > > -1 for running only partial tests.  Most failing unit tests that get
> > > > through fail only for certain platforms/configurations.  I personally
> > > > prefer to be assured the build and test is good before merge.  Most
> PR
> > > > merges aren't in a huge hurry.
> > > >
> > > > On Thu, Sep 28, 2017 at 9:54 AM, Naveen Swamy <mnnaveen@gmail.com>
> > > wrote:
> > > >
> > > > > +1 to make it protected. Here is what I am thinking for PR builds
> > > > > on a PR run Sanity Tests + build/test one platform->committer
> reviews
> > > the
> > > > > code and issues "Build Now", a full build is run->Github checks
> that
> > > the
> > > > > full build checks succeed before it can be merged.
> > > > >
> > > > > I agree with Madan that PR should be approved by one another
> > committer.
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Sep 28, 2017 at 9:37 AM, Madan Jampani <
> > > madan.jampani@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > +1
> > > > > >
> > > > > > At a minimum I'd like to see the following two happen:
> > > > > > - Option to merge is disabled until all required checks pass.
> > > > > > - Code is reviewed and given +1 by at least one other committer
> (no
> > > > self
> > > > > > review).
> > > > > >
> > > > > > On Wed, Sep 27, 2017 at 11:15 PM, Gautam <gautamnitc@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > Hi Chris,
> > > > > > >
> > > > > > >   Here <https://help.github.com/articles/about-protected-
> > branches/
> > > >
> > > > is
> > > > > > > user
> > > > > > > document on semantics of protected branch.
> > > > > > > In short when a branch is protected following applies to
that
> > > branch.
> > > > > > >
> > > > > > >    - Can't be force pushed
> > > > > > >    - Can't be deleted
> > > > > > >    - Can't have changes merged into it until required status
> > checks
> > > > > > >    <https://help.github.com/articles/about-required-
> > status-checks>
> > > > > pass
> > > > > > >    - Can't have changes merged into it until required reviews
> are
> > > > > > approved
> > > > > > >    <https://help.github.com/articles/approving-a-pull-
> > > > > > > request-with-required-reviews>
> > > > > > >    - Can't be edited or have files uploaded to it from
the web
> > > > > > >    - Can't have changes merged into it until changes to
files
> > that
> > > > > > > have a designated
> > > > > > >    code owner <https://help.github.com/
> > articles/about-codeowners/>
> > > > > have
> > > > > > >    been approved by that owner
> > > > > > >
> > > > > > >  I am sure many of us might not want to have all these
but we
> can
> > > > > debate
> > > > > > on
> > > > > > > it. My main motive was to "*Can't have changes merged into
it
> > until
> > > > > > > required status checks pass*"
> > > > > > >
> > > > > > >
> > > > > > > -Gautam
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Sep 27, 2017 at 11:09 PM, Chris Olivier <
> > > > cjolivier01@gmail.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > What does that mean? "Protected"? Protected from what?
> > > > > > > >
> > > > > > > > On Wed, Sep 27, 2017 at 11:08 PM Gautam <
> gautamnitc@gmail.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Chris,
> > > > > > > > >
> > > > > > > > >    I mean make "master branch protected" of 
MXNet.
> > > > > > > > >
> > > > > > > > > -Gautam
> > > > > > > > >
> > > > > > > > > On Wed, Sep 27, 2017 at 11:04 PM, Chris Olivier
<
> > > > > > cjolivier01@gmail.com
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > What does this mean? "Mx-net branch protected"?
> > > > > > > > > >
> > > > > > > > > > On Wed, Sep 27, 2017 at 9:59 PM Tsuyoshi
OZAWA <
> > > > > > > > ozawa.tsuyoshi@gmail.com
> > > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > +1,
> > > > > > > > > > >
> > > > > > > > > > > While I'm checking the recent build
failures, and I
> think
> > > the
> > > > > > > > decision
> > > > > > > > > > > of making the mx-net branch protected
is necessary for
> > > stable
> > > > > > > > > > > building.
> > > > > > > > > > > Thanks Kumar for resuming important
discussion.
> > > > > > > > > > >
> > > > > > > > > > > Best regards
> > > > > > > > > > > - Tsuyoshi
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Sep 28, 2017 at 12:56 PM, Kumar,
Gautam <
> > > > > > gauta@amazon.com>
> > > > > > > > > > wrote:
> > > > > > > > > > > > Reviving the discussion.
> > > > > > > > > > > >
> > > > > > > > > > > > At this point of time we have
couple of stable builds
> > > > > > > > > > > >
> > > > > > > > > > >
> https://builds.apache.org/view/Incubator%20Projects/job/
> > > > > > > > > > incubator-mxnet/job/master/448/
> > > > > > > > > > > >
> > > > > > > > > > >
> https://builds.apache.org/view/Incubator%20Projects/job/
> > > > > > > > > > incubator-mxnet/job/master/449/
> > > > > > > > > > > >
> > > > > > > > > > > > Should we have a quick discussion
or polling on
> making
> > > the
> > > > > > mx-net
> > > > > > > > > > branch
> > > > > > > > > > > protected? If you still think we shouldn’t
make it
> > > protected
> > > > > > please
> > > > > > > > > > provide
> > > > > > > > > > > a reason to support your claim.
> > > > > > > > > > > >
> > > > > > > > > > > > Few of us have concern over Jenkin’s
stability. If I
> > look
> > > > two
> > > > > > > weeks
> > > > > > > > > > > back, after upgrading Linux slave to
g2.8x and new
> > windows
> > > > AMI,
> > > > > > we
> > > > > > > > have
> > > > > > > > > > not
> > > > > > > > > > > seen any case where instance died due
to high memory
> > usage
> > > or
> > > > > any
> > > > > > > > > process
> > > > > > > > > > > got killed due to high cpu usage or
any other issue
> with
> > > > > windows
> > > > > > > > > slaves.
> > > > > > > > > > > >
> > > > > > > > > > > > Going forward we are also planning
that if we add any
> > new
> > > > > slave
> > > > > > > we
> > > > > > > > > will
> > > > > > > > > > > not enable the main load immediately,
but rather will
> do
> > > > ‘test
> > > > > > > build’
> > > > > > > > > to
> > > > > > > > > > > make sure that new slaves are not causing
any
> > > infrastructure
> > > > > > issue
> > > > > > > > and
> > > > > > > > > > > capable to perform as good as existing
slaves.
> > > > > > > > > > > >
> > > > > > > > > > > > -Gautam
> > > > > > > > > > > >
> > > > > > > > > > > > On 8/31/17, 5:27 PM, "Lupesko,
Hagay" <
> > lupesko@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > >     @madan looking into some failures
– you’re right…
> > > > there’s
> > > > > > > > > multiple
> > > > > > > > > > > issues going on, some of them intermittent,
and we want
> > to
> > > be
> > > > > > able
> > > > > > > to
> > > > > > > > > > merge
> > > > > > > > > > > fixes in.
> > > > > > > > > > > >     Agreed that we can wait with
setting up protected
> > > mode
> > > > > > until
> > > > > > > > > build
> > > > > > > > > > > stabilizes.
> > > > > > > > > > > >
> > > > > > > > > > > >     On 8/31/17, 11:41, "Madan
Jampani" <
> > > > > > madan.jampani@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > >         @hagay: we agree on the
end state. I'm not
> too
> > > > > > particular
> > > > > > > > > about
> > > > > > > > > > > how we get
> > > > > > > > > > > >         there. If you think enabling
it now and fixes
> > > > > > regression
> > > > > > > > > later
> > > > > > > > > > > is doable,
> > > > > > > > > > > >         I'm fine with. I see a
bit of a chicken and
> egg
> > > > > > problem.
> > > > > > > We
> > > > > > > > > > need
> > > > > > > > > > > to get
> > > > > > > > > > > >         some fixes in even when
the status checks are
> > > > > failing.
> > > > > > > > > > > >
> > > > > > > > > > > >         On Thu, Aug 31, 2017 at
11:25 AM, Lupesko,
> > Hagay
> > > <
> > > > > > > > > > > lupesko@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > >         > @madan – re: getting
to a stable CI first:
> > > > > > > > > > > >         > I’m concerned that
by not enabling
> protected
> > > > branch
> > > > > > > mode
> > > > > > > > > > ASAP,
> > > > > > > > > > > we’re just
> > > > > > > > > > > >         > taking in more regressions,
which makes a
> > > stable
> > > > > > build
> > > > > > > a
> > > > > > > > > > > moving target for
> > > > > > > > > > > >         > us…
> > > > > > > > > > > >         >
> > > > > > > > > > > >         > On 8/31/17, 10:49,
"Zha, Sheng" <
> > > > > zhasheng@amazon.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > >         >
> > > > > > > > > > > >         >     Just one thing:
please don’t disable
> more
> > > > tests
> > > > > > or
> > > > > > > > just
> > > > > > > > > > > raise the
> > > > > > > > > > > >         > tolerance thresholds.
> > > > > > > > > > > >         >
> > > > > > > > > > > >         >     Best regards,
> > > > > > > > > > > >         >     -sz
> > > > > > > > > > > >         >
> > > > > > > > > > > >         >     On 8/31/17, 10:45
AM, "Madan Jampani" <
> > > > > > > > > > > madan.jampani@gmail.com> wrote:
> > > > > > > > > > > >         >
> > > > > > > > > > > >         >         +1
> > > > > > > > > > > >         >         Before we
can turn protected mode I
> > > feel
> > > > we
> > > > > > > > should
> > > > > > > > > > > first get to a
> > > > > > > > > > > >         > stable CI
> > > > > > > > > > > >         >         pipeline.
> > > > > > > > > > > >         >         Sandeep is
chasing down known
> > breaking
> > > > > > issues.
> > > > > > > > > > > >         >
> > > > > > > > > > > >         >
> > > > > > > > > > > >         >         On Thu, Aug
31, 2017 at 10:27 AM,
> > Hagay
> > > > > > > Lupesko <
> > > > > > > > > > > lupesko@gmail.com>
> > > > > > > > > > > >         > wrote:
> > > > > > > > > > > >         >
> > > > > > > > > > > >         >         > Build
stability is a major issue,
> > > > builds
> > > > > > have
> > > > > > > > > been
> > > > > > > > > > > failing left
> > > > > > > > > > > >         > and right
> > > > > > > > > > > >         >         > over
the last week. Some of it is
> > due
> > > > to
> > > > > > > > Jenkins
> > > > > > > > > > > slave issues,
> > > > > > > > > > > >         > but some are
> > > > > > > > > > > >         >         > real
regressions.
> > > > > > > > > > > >         >         > We need
to be more strict in the
> > code
> > > > > we're
> > > > > > > > > > > committing.
> > > > > > > > > > > >         >         >
> > > > > > > > > > > >         >         > I propose
we configure our master
> > to
> > > > be a
> > > > > > > > > protected
> > > > > > > > > > > branch (
> > > > > > > > > > > >         >         >
> > > > > > > > > > > https://help.github.com/articles/about-protected-
> > branches/
> > > ).
> > > > > > > > > > > >         >         >
> > > > > > > > > > > >         >         > Thoughts?
> > > > > > > > > > > >         >         >
> > > > > > > > > > > >         >         > On 2017-08-28
22:41, sandeep
> > > > > krishnamurthy
> > > > > > <
> > > > > > > > > > > s...@gmail.com>
> > > > > > > > > > > >         > wrote:
> > > > > > > > > > > >         >         > >
Hello Committers and
> > Contributors,>
> > > > > > > > > > > >         >         > >
> > > > > > > > > > > >         >         > >
Due to unstable build
> pipelines,
> > > from
> > > > > > past
> > > > > > > 1
> > > > > > > > > > week,
> > > > > > > > > > > PRs are
> > > > > > > > > > > >         > being merged>
> > > > > > > > > > > >         >         > >
after CR ignoring PR build
> > status.
> > > > > Build
> > > > > > > > > pipeline
> > > > > > > > > > > is much more
> > > > > > > > > > > >         > stable
> > > > > > > > > > > >         >         > than>
> > > > > > > > > > > >         >         > >
last week and most of the build
> > > > > failures
> > > > > > > you
> > > > > > > > > see
> > > > > > > > > > > from now on,
> > > > > > > > > > > >         > are likely
> > > > > > > > > > > >         >         > to>
> > > > > > > > > > > >         >         > >
be a valid failure and hence,
> it
> > is
> > > > > > > > recommended
> > > > > > > > > > to
> > > > > > > > > > > wait for PR
> > > > > > > > > > > >         > builds,
> > > > > > > > > > > >         >         > see>
> > > > > > > > > > > >         >         > >
the root cause of any build
> > > failures
> > > > > > before
> > > > > > > > > > > proceeding with
> > > > > > > > > > > >         > merges.>
> > > > > > > > > > > >         >         > >
> > > > > > > > > > > >         >         > >
At this point of time, there
> are
> > 2
> > > > > > > > intermittent
> > > > > > > > > > > issue yet to
> > > > > > > > > > > >         > be fixed ->
> > > > > > > > > > > >         >         > >
* Network error leading to
> GitHub
> > > > > > requests
> > > > > > > > > > > throwing 404>
> > > > > > > > > > > >         >         > >
* A conflict in artifacts
> > generated
> > > > > > between
> > > > > > > > > > > branches/PR -
> > > > > > > > > > > >         > Cause unknown
> > > > > > > > > > > >         >         > yet.>
> > > > > > > > > > > >         >         > >
These issues will be fixed
> soon.>
> > > > > > > > > > > >         >         > >
> > > > > > > > > > > >         >         > >
> > > > > > > > > > > >         >         > >
-- >
> > > > > > > > > > > >         >         > >
Sandeep Krishnamurthy>
> > > > > > > > > > > >         >         > >
> > > > > > > > > > > >         >         >
> > > > > > > > > > > >         >
> > > > > > > > > > > >         >
> > > > > > > > > > > >         >
> > > > > > > > > > > >         >
> > > > > > > > > > > >         >
> > > > > > > > > > > >         >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > - Tsuyoshi
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Best Regards,
> > > > > > > > > Gautam Kumar
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best Regards,
> > > > > > > Gautam Kumar
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message