mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sheng Zha <zhash...@apache.org>
Subject Re: [CI] Staggered build pipelines enabled
Date Tue, 16 Jun 2020 19:35:10 GMT
Hi Marco,

> We've always had the policy of not enforcing protected feature branches and this pipeline
is causing human errors.

There are several problems with this statement:
1. This is a release branch as defined in our release process. It's not a feature branch.
2. There was no explicit decision or an agreed upon policy of not protecting the release branch.
To me it doesn't make sense to only protect the development branch while allowing force push
and revision of history on release branches, and this is what needs to be fixed.
3. I made a conscious decision when merging it given the CI sanity check at the time, given
that there were changes from Chai to CI pipelines which may not take effect. While a better
approach would have been to test this separately first, I see such issue as a normal part
of development and I wouldn't be too concerned given that we use proper version control.

Regarding the benefit of having the sanity check step, you can easily measure its impact by
looking at the recent PR build history of the sanity step [1], which I believe you do have
access to. Out of the recent 563 builds in PR validation, 137 builds have broken sanity check,
or 24.3%. Given the amount of reduction in wasted compute on already failed validation pipelines,
to me waiting for two docker cache builds in rare occasions is justified.

Since the increase in PR validation time is due to the docker build time, you can help by
improving the docker cache of the CI. While a couple of people are already investing time
in improving the status quo in this regard, you are more than welcome to step up and participate
on improving it.

-sz

[1] http://jenkins.mxnet-ci.amazon-ml.com/job/mxnet-validation/job/sanity/view/change-requests/builds

On 2020/06/15 23:19:51, Marco de Abreu <marcoabreu@apache.org> wrote: 
> Hello,
> 
> I'd like to revisit this decision and review whether the expected benefit
> (cost reduction) was achieved and how the overall PR validation duration
> has changed. Could you guys share some information on this matter?
> 
> Just today, we've had two incidents which were caused by this change:
> 
> 1. This PR was merged prematurely because the follow up pipelines didn't
> run (whether it's a timing issue or something else I don't know). We've
> always had the policy of not enforcing protected feature branches and this
> pipeline is causing human errors.
> https://github.com/apache/incubator-mxnet/pull/18560
> 
> 2. The sanity check, which got into the critical path here, starts running
> into timeouts. I'm aware that this is in case of cache misses, but none the
> less does that heavily increase the waiting duration since developers now
> have to wait for two entire cache build cycles instead of just one:
> https://github.com/apache/incubator-mxnet/pull/18568
> 
> I understand that money has to be conserved, but I still stand by my
> opinion that this was the wrong move and development speed was sacrificed.
> 
> If there are no other compelling arguments, I'd prefer if the previous
> state of parallel pipelines could be restored.
> 
> Best regards
> Marco
> 
> sandeep krishnamurthy <sandeep.krishna98@gmail.com> schrieb am Di., 28.
> Apr. 2020, 07:55:
> 
> > Thanks a lot Joe for your contributions. Thank you Marco, Chai and Leo for
> > helping this.
> > Especially given that you had seen around 57% build failing in sanity
> > check, this should be very helpful to provide faster feedback for PR
> > authors on sanity issues plus save a lot of unnecessary builds.
> >
> > Best,
> > Sandeep
> >
> > On Mon, 27 Apr 2020, 10:20 pm Joe Evans, <joseph.evans@gmail.com> wrote:
> >
> > > Hi dev community,
> > >
> > >
> > > We have made the changes to the mxnet CI system to incorporate the
> > > staggered build pipelines. With this change, when a new PR is created or
> > an
> > > existing PR is updated, the status checks will only show
> > > “ci/jenkins/mxnet-validation/sanity” build job at first. Once this build
> > > completes successfully (avg. run time is about 10min), the remaining CI
> > > build jobs will appear and function as previously.
> > >
> > >
> > > Please let me know if you experience any issues with this change.
> > >
> > >
> > > Thanks!
> > >
> > > Joe
> > >
> > >
> > > References:
> > >
> > >
> > > https://github.com/apache/incubator-mxnet/issues/17802
> > >
> >
> 

Mime
View raw message