mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Naveen Swamy <mnnav...@gmail.com>
Subject Re: CI impaired
Date Fri, 30 Nov 2018 19:24:30 GMT
Hi Marco/Gavin,

Thanks for the clarification. I was not aware that it has been tested on a
separate test environment(this is what I was suggesting and make the
changes in a more controlled manner), last time the change was made, many
PRs were left dangling and developers had to go trigger and I triggered
them at least 5 times before it succeeded today.

Appreciate all the hard work to make CI better.

-Naveen

On Fri, Nov 30, 2018 at 8:50 AM Gavin M. Bell <gavin.max.bell@gmail.com>
wrote:

> Hey Folks,
>
> Marco has been running this change in dev, with flying colors, for some
> time. This is not an experiment but a roll out that was announced.  We also
> decided to make this change post the release cut so limit the blast radius
> from any critical obligations to the community.  Marco is accountable for
> this work and will address any issues that may occur as he has been put
> on-call.  We have, to our best ability, mitigated as much risk as possible
> and now it is time to pull the trigger.  The community will enjoy a bit
> more visibility and clarity into the test process which will be
> advantageous, as well as allowing us to extend our infrastructure in a way
> that affords us more flexibility.
>
> No pending PRs will be impacted.
>
> Thank you for your support as we evolve this system to better serve the
> community.
>
> -Gavin
>
> On Fri, Nov 30, 2018 at 5:23 PM Marco de Abreu
> <marco.g.abreu@googlemail.com.invalid> wrote:
>
> > Hello Naveen, this is not an experiment. Everything has been tested in
> our
> > test system and is considered working 100%. This is not a test but
> actually
> > the move into production - the merge into master happened a week ago. We
> > now just have to put all PRs into the catalogue, which means that all PRs
> > have to be analyzed with the new pipelines - the only thing that will be
> > noticeable is that the CI is under higher load.
> >
> > The pending PRs will not be impacted. The existing pipeline is still
> > running in parallel and everything will behave as before.
> >
> > -Marco
> >
> > On Fri, Nov 30, 2018 at 4:41 PM Naveen Swamy <mnnaveen@gmail.com> wrote:
> >
> > > Marco, run your experiments on a branch - set up, test it well and then
> > > bring it to the master.
> > >
> > > > On Nov 30, 2018, at 6:53 AM, Marco de Abreu <
> > > marco.g.abreu@googlemail.com.INVALID> wrote:
> > > >
> > > > Hello,
> > > >
> > > > I'm now moving forward with #1. I will try to get to #3 as soon as
> > > possible
> > > > to reduce parallel jobs in our CI. You might notice some unfinished
> > > jobs. I
> > > > will let you know as soon as this process has been completed. Until
> > then,
> > > > please bare with me since we have hundreds of jobs to run in order to
> > > > validate all PRs.
> > > >
> > > > Best regards,
> > > > Marco
> > > >
> > > > On Fri, Nov 30, 2018 at 1:36 AM Marco de Abreu <
> > > marco.g.abreu@googlemail.com>
> > > > wrote:
> > > >
> > > >> Hello,
> > > >>
> > > >> since the release branch has now been cut, I would like to move
> > forward
> > > >> with the CI improvements for the master branch. This would include
> the
> > > >> following actions:
> > > >> 1. Re-enable the new Jenkins job
> > > >> 2. Request Apache Infra to move the protected branch check from the
> > main
> > > >> pipeline to our new ones
> > > >> 3. Merge https://github.com/apache/incubator-mxnet/pull/13474 -
> this
> > > >> finalizes the deprecation process
> > > >>
> > > >> If nobody objects, I would like to start with #1 soon. Mentors,
> could
> > > you
> > > >> please assist to create the Apache Infra ticket? I would then take
> it
> > > from
> > > >> there and talk to Infra.
> > > >>
> > > >> Best regards,
> > > >> Marco
> > > >>
> > > >> On Mon, Nov 26, 2018 at 2:47 AM kellen sunderland <
> > > >> kellen.sunderland@gmail.com> wrote:
> > > >>
> > > >>> Sorry, [1] meant to reference
> > > >>> https://issues.jenkins-ci.org/browse/JENKINS-37984 .
> > > >>>
> > > >>> On Sun, Nov 25, 2018 at 5:41 PM kellen sunderland <
> > > >>> kellen.sunderland@gmail.com> wrote:
> > > >>>
> > > >>>> Marco and I ran into another urgent issue over the weekend
that
> was
> > > >>>> causing builds to fail.  This issue was unrelated to any feature
> > > >>>> development work, or other CI fixes applied recently, but
it did
> > > require
> > > >>>> quite a bit of work from Marco (and a little from me) to fix.
> > > >>>>
> > > >>>> We spent enough time on the problem that it caused us to take
a
> step
> > > >>> back
> > > >>>> and consider how we could both fix issues in CI and support
the
> 1.4
> > > >>> release
> > > >>>> with the least impact possible on MXNet devs.  Marco had planned
> to
> > > >>> make a
> > > >>>> significant change to the CI to fix a long-standing Jenkins
error
> > [1],
> > > >>> but
> > > >>>> we feel that most developers would prioritize having a stable
> build
> > > >>>> environment for the next few weeks over having this fix in
place.
> > > >>>>
> > > >>>> To properly introduce a new CI system the intent was to do
a
> gradual
> > > >>>> blue/green roll out of the fix.  To manage this rollout would
have
> > > taken
> > > >>>> operational effort and double compute load as we run systems
in
> > > >>> parallel.
> > > >>>> This risks outages due to scaling limits, and we’d rather
make
> this
> > > >>> change
> > > >>>> during a period of low-developer activity, i.e. shortly after
the
> > 1.4
> > > >>>> release.
> > > >>>>
> > > >>>> This means that from now until the 1.4 release, in order to
reduce
> > > >>>> complexity MXNet developers should only see a single Jenkins
> > > >>> verification
> > > >>>> check, and a single Travis check.
> > > >>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> >
>
>
> --
> Sincerely,
> Gavin M. Bell
>
>  "Never mistake a clear view for a short distance."
>               -Paul Saffo
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message