mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Naveen Swamy <mnnav...@gmail.com>
Subject Re: CI impaired
Date Fri, 30 Nov 2018 15:41:20 GMT
Marco, run your experiments on a branch - set up, test it well and then bring it to the master.


> On Nov 30, 2018, at 6:53 AM, Marco de Abreu <marco.g.abreu@googlemail.com.INVALID>
wrote:
> 
> Hello,
> 
> I'm now moving forward with #1. I will try to get to #3 as soon as possible
> to reduce parallel jobs in our CI. You might notice some unfinished jobs. I
> will let you know as soon as this process has been completed. Until then,
> please bare with me since we have hundreds of jobs to run in order to
> validate all PRs.
> 
> Best regards,
> Marco
> 
> On Fri, Nov 30, 2018 at 1:36 AM Marco de Abreu <marco.g.abreu@googlemail.com>
> wrote:
> 
>> Hello,
>> 
>> since the release branch has now been cut, I would like to move forward
>> with the CI improvements for the master branch. This would include the
>> following actions:
>> 1. Re-enable the new Jenkins job
>> 2. Request Apache Infra to move the protected branch check from the main
>> pipeline to our new ones
>> 3. Merge https://github.com/apache/incubator-mxnet/pull/13474 - this
>> finalizes the deprecation process
>> 
>> If nobody objects, I would like to start with #1 soon. Mentors, could you
>> please assist to create the Apache Infra ticket? I would then take it from
>> there and talk to Infra.
>> 
>> Best regards,
>> Marco
>> 
>> On Mon, Nov 26, 2018 at 2:47 AM kellen sunderland <
>> kellen.sunderland@gmail.com> wrote:
>> 
>>> Sorry, [1] meant to reference
>>> https://issues.jenkins-ci.org/browse/JENKINS-37984 .
>>> 
>>> On Sun, Nov 25, 2018 at 5:41 PM kellen sunderland <
>>> kellen.sunderland@gmail.com> wrote:
>>> 
>>>> Marco and I ran into another urgent issue over the weekend that was
>>>> causing builds to fail.  This issue was unrelated to any feature
>>>> development work, or other CI fixes applied recently, but it did require
>>>> quite a bit of work from Marco (and a little from me) to fix.
>>>> 
>>>> We spent enough time on the problem that it caused us to take a step
>>> back
>>>> and consider how we could both fix issues in CI and support the 1.4
>>> release
>>>> with the least impact possible on MXNet devs.  Marco had planned to
>>> make a
>>>> significant change to the CI to fix a long-standing Jenkins error [1],
>>> but
>>>> we feel that most developers would prioritize having a stable build
>>>> environment for the next few weeks over having this fix in place.
>>>> 
>>>> To properly introduce a new CI system the intent was to do a gradual
>>>> blue/green roll out of the fix.  To manage this rollout would have taken
>>>> operational effort and double compute load as we run systems in
>>> parallel.
>>>> This risks outages due to scaling limits, and we’d rather make this
>>> change
>>>> during a period of low-developer activity, i.e. shortly after the 1.4
>>>> release.
>>>> 
>>>> This means that from now until the 1.4 release, in order to reduce
>>>> complexity MXNet developers should only see a single Jenkins
>>> verification
>>>> check, and a single Travis check.
>>>> 
>>>> 
>>> 
>> 

Mime
View raw message