mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tsuyoshi OZAWA <ozawa.tsuyo...@gmail.com>
Subject Re: Apache MXNet build failures are mostly valid - verify before merge
Date Thu, 28 Sep 2017 04:58:59 GMT
+1,

While I'm checking the recent build failures, and I think the decision
of making the mx-net branch protected is necessary for stable
building.
Thanks Kumar for resuming important discussion.

Best regards
- Tsuyoshi

On Thu, Sep 28, 2017 at 12:56 PM, Kumar, Gautam <gauta@amazon.com> wrote:
> Reviving the discussion.
>
> At this point of time we have couple of stable builds
> https://builds.apache.org/view/Incubator%20Projects/job/incubator-mxnet/job/master/448/
> https://builds.apache.org/view/Incubator%20Projects/job/incubator-mxnet/job/master/449/
>
> Should we have a quick discussion or polling on making the mx-net branch protected? If
you still think we shouldn’t make it protected please provide a reason to support your claim.
>
> Few of us have concern over Jenkin’s stability. If I look two weeks back, after upgrading
Linux slave to g2.8x and new windows AMI, we have not seen any case where instance died due
to high memory usage or any process got killed due to high cpu usage or any other issue with
windows slaves.
>
> Going forward we are also planning that if we add any new slave we will not enable the
main load immediately, but rather will do ‘test build’ to make sure that new slaves are
not causing any infrastructure issue and capable to perform as good as existing slaves.
>
> -Gautam
>
> On 8/31/17, 5:27 PM, "Lupesko, Hagay" <lupesko@gmail.com> wrote:
>
>     @madan looking into some failures – you’re right… there’s multiple issues
going on, some of them intermittent, and we want to be able to merge fixes in.
>     Agreed that we can wait with setting up protected mode until build stabilizes.
>
>     On 8/31/17, 11:41, "Madan Jampani" <madan.jampani@gmail.com> wrote:
>
>         @hagay: we agree on the end state. I'm not too particular about how we get
>         there. If you think enabling it now and fixes regression later is doable,
>         I'm fine with. I see a bit of a chicken and egg problem. We need to get
>         some fixes in even when the status checks are failing.
>
>         On Thu, Aug 31, 2017 at 11:25 AM, Lupesko, Hagay <lupesko@gmail.com> wrote:
>
>         > @madan – re: getting to a stable CI first:
>         > I’m concerned that by not enabling protected branch mode ASAP, we’re
just
>         > taking in more regressions, which makes a stable build a moving target for
>         > us…
>         >
>         > On 8/31/17, 10:49, "Zha, Sheng" <zhasheng@amazon.com> wrote:
>         >
>         >     Just one thing: please don’t disable more tests or just raise the
>         > tolerance thresholds.
>         >
>         >     Best regards,
>         >     -sz
>         >
>         >     On 8/31/17, 10:45 AM, "Madan Jampani" <madan.jampani@gmail.com>
wrote:
>         >
>         >         +1
>         >         Before we can turn protected mode I feel we should first get to
a
>         > stable CI
>         >         pipeline.
>         >         Sandeep is chasing down known breaking issues.
>         >
>         >
>         >         On Thu, Aug 31, 2017 at 10:27 AM, Hagay Lupesko <lupesko@gmail.com>
>         > wrote:
>         >
>         >         > Build stability is a major issue, builds have been failing
left
>         > and right
>         >         > over the last week. Some of it is due to Jenkins slave issues,
>         > but some are
>         >         > real regressions.
>         >         > We need to be more strict in the code we're committing.
>         >         >
>         >         > I propose we configure our master to be a protected branch
(
>         >         > https://help.github.com/articles/about-protected-branches/).
>         >         >
>         >         > Thoughts?
>         >         >
>         >         > On 2017-08-28 22:41, sandeep krishnamurthy <s...@gmail.com>
>         > wrote:
>         >         > > Hello Committers and Contributors,>
>         >         > >
>         >         > > Due to unstable build pipelines, from past 1 week, PRs
are
>         > being merged>
>         >         > > after CR ignoring PR build status. Build pipeline is much
more
>         > stable
>         >         > than>
>         >         > > last week and most of the build failures you see from
now on,
>         > are likely
>         >         > to>
>         >         > > be a valid failure and hence, it is recommended to wait
for PR
>         > builds,
>         >         > see>
>         >         > > the root cause of any build failures before proceeding
with
>         > merges.>
>         >         > >
>         >         > > At this point of time, there are 2 intermittent issue
yet to
>         > be fixed ->
>         >         > > * Network error leading to GitHub requests throwing 404>
>         >         > > * A conflict in artifacts generated between branches/PR
-
>         > Cause unknown
>         >         > yet.>
>         >         > > These issues will be fixed soon.>
>         >         > >
>         >         > >
>         >         > > -- >
>         >         > > Sandeep Krishnamurthy>
>         >         > >
>         >         >
>         >
>         >
>         >
>         >
>         >
>         >
>
>
>
>
>



-- 
- Tsuyoshi

Mime
View raw message