mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marco de Abreu <marco.g.ab...@googlemail.com>
Subject Re: Protected master needs to be turned off
Date Mon, 20 Nov 2017 22:36:21 GMT
Small update regarding the status of the new CI: We've been able to get
ubuntu builds and tests up and running.

I’ve received requests to provide exact times for all important stages of
the CI. As the first runs have been executed successfully, I’ll provide you
with some data. This will give you a small guidance where our bottlenecks
are located and in which field we can have improvements. All jobs of the
same stage are being executed in parallel.



In the following you’ll see the durations of all Ubuntu-Builds and -Tests.
They are all (CPU- and GPU-tests) executed on a G3.8xlarge with 32 vCPUs.
These times only cover the actual core execution time of the task
*without* stashing,
cleaning etc.



Build:

   - Amalgamation: 3m40s
   - Amalgamation MIN: 3m49s
   - *CPU: Openblas: 8m38s*
   - *GPU CUDA7.5+cuDNN5: 11m55s*
   - *GPU: MKLML: 9m42s*





Unit Test:

   - Perl CPU: 17m36s
   - Perl GPU: 8m31s
   - *Python2: CPU: 1h10m10s*
   - Python2: GPU: 14m10s
   - Python2: MKLML-CPU: 10m25s + 1m43s = 12m08s
   - Python2: MKLML-GPU: 14m19s
   - *Python3: CPU: 1h8m26s*
   - Python3: GPU: 14m58s
   - Python3: MKLML-CPU: 10m31s (seems like this job is not running the
   Python2-Train-Tests)
   - Python3: MKLML-GPU: 13m37s
   - *R: CPU: 20m6s + 9s + 9m58s = 30m13s*
   - *R: GPU: 21m50s + 10s + 1m17s = 23m17s*
   - Scala: CPU: 2m18s + 3m57s = 6m15s



Integration Test:

   - Caffe GPU: 7m42s
   - Python GPU: 1m47s
   - Cpp-package GPU: 59s


The next step is to get Windows up and running and improve execution time
on CPU-tasks.

-Marco

On Mon, Nov 20, 2017 at 9:18 PM, YiZhi Liu <javelinjs@gmail.com> wrote:

> +1
>
> 2017-11-20 11:47 GMT-08:00 Tianqi Chen <tqchen@cs.washington.edu>:
> > +1 until new CI is implemented.
> >
> > Tianqi
> >
> > On Mon, Nov 20, 2017 at 11:11 AM, Eric Xie <jxie@apache.org> wrote:
> >
> >> A lot of people seems to be confused, so let's clarify the separation of
> >> roles/responsibilities:
> >>
> >> 1. The committers that merge code are responsible for code quality and
> >> tests passing on master.
> >>
> >> 2. The CI / infra maintainers are responsible for keeping CI running
> >> properly and honestly reporting bugs.
> >> If test fails because Jenkins is faulty or slow, you need to fix
> Jenkins.
> >> If test fails because there is a bug in the code, then you did a good
> job
> >> catching bugs. It is not your responsibility to fix the bug. You merely
> >> should report it to committers.
> >>
> >> Thanks,
> >> Junyuan Xie
> >>
> >>
> >> On 2017-11-19 16:07, Gautam <gautamnitc@gmail.com> wrote:
> >> > -1
> >> >
> >> > Please see inline.
> >> >
> >> > On Nov 19, 2017 12:51 PM, "Eric Xie" <jxie@apache.org> wrote:
> >> >
> >> > Hi all,
> >> > I'm starting this thread to vote on turning off protected master. The
> >> > reasons are:
> >> >
> >> > 1. Since we turned on protected master pending PRs has grown from 40
> to
> >> 80.
> >> > It is severely slowing down development.
> >> >
> >> >
> >> >      Turning protection off, will give you ability to merge the code
> >> which
> >> > has build failure. How and more importantly who will be best judge to
> >> > figure out what's is wrong ? If CI is a problem then we have people
> from
> >> > infra team, who are sort of 'on call" day and night trying to fix.
> >> > Including me trying to fix the slave at 2 am in night. I have seen
> >> > commiters sending PRs and if it fails without even looking at the
> >> > reason,which could be just a temporary error, they reach out to infra
> >> team
> >> > immediately. So I don't agree that we should turn off the protected
> >> branch
> >> > just for the sake of speed. We should care more on quality than speed.
> >> >
> >> >
> >> > 2. Committers, not CI, are ultimately responsible for the code they
> >> merge.
> >> > You should only override the CI when you are very confident that CI is
> >> the
> >> > problem, not your code. If it turns out you are wrong, you should fix
> it
> >> > ASAP. This is the bare minimum requirement for all committers: BE
> >> > RESPONSIBLE.
> >> >
> >> > How do we make sure that this in deed happens?
> >> >
> >> > I'm aware of the argument for using protected master: It make sure
> that
> >>
> >>
> >>
> >> > master is stable.
> >> >
> >> > Well, master will be most stable if we stop adding any commits to it.
> But
> >> > that's not what we want is it?
> >> >
> >> > No we are not saying don't add anything in master, we are just saying
> >> > please don't add bad code to the master. And yes I have seen bad code
> has
> >> > been merged to the master when protected branch was not enabled.
> >> >
> >> > Protected master hardly adds any stability. The faulty tests that
> breaks
> >> > master at random got merged into master because they happened to
> succeed
> >> > once.
> >> >
> >> > That's not true, it filter out one of the important aspect that at
> least
> >> > when code was merged it completed the whole cycle of build and test.
> Sure
> >> > flaky test we can track down.
> >> >
> >> > Thanks,
> >> > Junyuan Xie
> >> >
> >>
>
>
>
> --
> Yizhi Liu
> DMLC member
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message