mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marco de Abreu <marco.g.ab...@googlemail.com>
Subject [LAZY VOTE][RESULT] Upgrade CI to CUDA 9.1 with CuDNN 7.0
Date Tue, 20 Mar 2018 22:20:24 GMT
Hello,

the results of this vote are as follows:

+1:
Jun
Anirudh
Hao
Marco

0:
Chris

-1:
Naveen (veto recalled as of
https://lists.apache.org/thread.html/242db72a0c96349ef6e0ff1d3b1fe0dc7f7a9082532724c3293666c5@%3Cdev.mxnet.apache.org%3E
)

Under the constraint that we will use CUDA 8 on Windows and CUDA 9.1 on
UNIX slaves and work on integration tests for CUDA 8 in the long term, this
vote counts as PASSED.

The PR for this change is available at
https://github.com/apache/incubator-mxnet/pull/10108. I have developed and
tested the new slaves in our test environment and everything looks
promising so far. The plan is as follows:

   1. Get https://github.com/apache/incubator-mxnet/pull/10108 approved to
   allow self-merge – CI can’t pass until slaves have been upgraded.
   2. Replace all existing slaves with new upgraded slaves.
   3. Retrigger https://github.com/apache/incubator-mxnet/pull/10108 to
   merge necessary changes into master.

IMPORTANT: The migration will happen tomorrow, so please expect some delay
in job execution - the CI website will be unaffected. Ideally, no jobs
should fail - in case they do, please feel free to retrigger them by using
an empty commit. In case of any errors appearing after the upgrade, don't
hesitate to contact me!

Best regards,
Marco

On Tue, Mar 20, 2018 at 1:39 AM, Naveen Swamy <mnnaveen@gmail.com> wrote:

> Yes, for short-term.
>
> On Monday, March 19, 2018, Chris Olivier <cjolivier01@apache.org> wrote:
>
> > In the short ter, Naveen, are you ok with Linux running CUDA 9 and
> Windows
> > CUDA 8 in order to get CUDA version coverage?
> >
> > On 2018/03/16 21:09:09, Marco de Abreu <marco.g.abreu@googlemail.com>
> > wrote:
> > > Thanks for your input. How would you propose to proceed in terms of a
> > > timeline in case this vote succeedes? I don't really have time to work
> > on a
> > > nightly setup right now. Would anybody in the community be able to help
> > me
> > > out here or shall we wait with the migration until a nightly setup for
> > CUDA
> > > 8 is up?
> > >
> > > -Marco
> > >
> > > On Fri, Mar 16, 2018 at 9:55 PM, Bhavin Thaker <bhavinthaker@gmail.com
> >
> > > wrote:
> > >
> > > > +1 to the suggestion of testing CUDA8 in few nightly instances and
> > using
> > > > CUDA9 for most instances in CI.
> > > >
> > > > Bhavin Thaker.
> > > >
> > > > On Fri, Mar 16, 2018 at 12:37 PM Naveen Swamy <mnnaveen@gmail.com>
> > wrote:
> > > >
> > > > > I think its best to add support for CUDA 9.0 while retaining
> existing
> > > > > support for CUDA 8, code might regress when you remove and create
> > more
> > > > work
> > > > > to add CUDA 8 support back.
> > > > >
> > > > > On Fri, Mar 16, 2018 at 9:29 AM, Marco de Abreu <
> > > > > marco.g.abreu@googlemail.com> wrote:
> > > > >
> > > > > > Yeah, sorry Chris, mixed up the names.
> > > > > >
> > > > > > @Naveen: Would you be fine with doing the switch now and adding
> > > > > integration
> > > > > > tests later or is this a hard constraint for you?
> > > > > >
> > > > > > On Wed, Mar 14, 2018 at 6:39 PM, Chris Olivier <
> > cjolivier01@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Isn't the TItan V the Volta and not the Tesla?
> > > > > > >
> > > > > > > On Wed, Mar 14, 2018 at 10:36 AM, Naveen Swamy <
> > mnnaveen@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Marco,
> > > > > > > > My -1 vote is for dropping support to CUDA 8 and not
for
> adding
> > > > CUDA
> > > > > 9.
> > > > > > > > CUDA 9.0 support for MXNet was added Oct'30-2017,
I think
> that
> > all
> > > > > > users
> > > > > > > > might not have switched to CUDA 9.0
> > > > > > > >
> > > > > > > > Look at the earlier discussion on the same topic
> > > > > > > >
> > > > > > > > https://lists.apache.org/thread.html/
> > > > 27b84e4fc0e0728f2e4ad8b6827d7f
> > > > > > > > 996635021a5a4d47b5d3f4dbfb@%3Cdev.mxnet.apache.org%3E
> > > > > > > >
> > > > > > > > On Wed, Mar 14, 2018 at 10:14 AM, Marco de Abreu <
> > > > > > > > marco.g.abreu@googlemail.com> wrote:
> > > > > > > >
> > > > > > > > > Right, the code changes would not be validated
against CUDA
> > 8.0
> > > > as
> > > > > > part
> > > > > > > > of
> > > > > > > > > the PR process.
> > > > > > > > >
> > > > > > > > > I don't have any numbers, but it's pretty unlikely
that
> > anybody
> > > > is
> > > > > > > still
> > > > > > > > > using CUDA 8.0. According to
> > > > > > > > > https://en.wikipedia.org/wiki/CUDA#GPUs_supported,
the
> > devices
> > > > > which
> > > > > > > are
> > > > > > > > > not being supported by CUDA 9 are under the Fermi
> > architecture
> > > > > which
> > > > > > > has
> > > > > > > > > been released in April 2010. These GPUs are way
too old,
> so I
> > > > think
> > > > > > > we're
> > > > > > > > > safe with not covering them specifically - this
does not
> mean
> > > > we're
> > > > > > > > > entirely deprecating them.
> > > > > > > > >
> > > > > > > > > One thing to note here is that we're not testing
CUDA 9 as
> of
> > > > now.
> > > > > > > > > Considering that the Telsa architecture (Titan
V, V100)
> > requires
> > > > at
> > > > > > > least
> > > > > > > > > CUDA 9 and those are probably the most widely
used GPUs for
> > Deep
> > > > > > > > Learning,
> > > > > > > > > we'd probably be covering a wider user-base in
comparison
> to
> > > > CUDA 8
> > > > > > if
> > > > > > > we
> > > > > > > > > make that switch.
> > > > > > > > >
> > > > > > > > > -Marco
> > > > > > > > >
> > > > > > > > > On Wed, Mar 14, 2018 at 5:59 PM, Naveen Swamy
<
> > > > mnnaveen@gmail.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Does this mean that MXNet Users who use
CUDA 8.0 will not
> > be
> > > > > > > > > > supported(since you are stopping to test
CUDA 8.0) ? I
> > suggest
> > > > we
> > > > > > at
> > > > > > > > > least
> > > > > > > > > > have nightly tests for CUDA 8.0.
> > > > > > > > > >
> > > > > > > > > > Do you have a sense of how many users are
using CUDA
> > 8.0/9.0 ?
> > > > > > > > > >
> > > > > > > > > > -1
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Wed, Mar 14, 2018 at 9:50 AM, Chris Olivier
<
> > > > > > > cjolivier01@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > +0
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Mar 14, 2018 at 9:45 AM, Jin,
Hao <
> > hjjn@amazon.com>
> > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > +1
> > > > > > > > > > > >
> > > > > > > > > > > > On 3/14/18, 9:04 AM, "Anirudh"
<
> anirudh2290@gmail.com
> > >
> > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > >     +1
> > > > > > > > > > > >
> > > > > > > > > > > >     On Mar 14, 2018 8:56 AM, "Wu,
Jun" <
> > jwum@amazon.com>
> > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > >     > +1
> > > > > > > > > > > >     >
> > > > > > > > > > > >     > On 3/14/18, 8:52 AM,
"Marco de Abreu" <
> > > > > > > > > > > marco.g.abreu@googlemail.com>
> > > > > > > > > > > >     > wrote:
> > > > > > > > > > > >     >
> > > > > > > > > > > >     >     Hello,
> > > > > > > > > > > >     >
> > > > > > > > > > > >     >     this is a vote to
upgrade our CI
> environment
> > from
> > > > > the
> > > > > > > > > current
> > > > > > > > > > > > CUDA 8.0
> > > > > > > > > > > >     > with
> > > > > > > > > > > >     >     CuDNN 5.0 to CUDA
9.1 with CuDNN 7.0.
> Reason
> > > > being
> > > > > > that
> > > > > > > > > NVCC
> > > > > > > > > > > > under
> > > > > > > > > > > >     > CUDA 8
> > > > > > > > > > > >     >     does not support
the Volta GPUs used in AWS
> > P3
> > > > > > > instances
> > > > > > > > > and
> > > > > > > > > > > thus
> > > > > > > > > > > >     > limiting
> > > > > > > > > > > >     >     our test capabilities.
More details are
> > available
> > > > > at
> > > > > > > [1].
> > > > > > > > > > > >     >
> > > > > > > > > > > >     >     In order to introduce
support for
> > Quantization
> > > > [1],
> > > > > > I'd
> > > > > > > > > like
> > > > > > > > > > to
> > > > > > > > > > > >     > perform a
> > > > > > > > > > > >     >     system-wide upgrade.
This should have no
> > negative
> > > > > > > impact
> > > > > > > > in
> > > > > > > > > > our
> > > > > > > > > > > > users
> > > > > > > > > > > >     > but
> > > > > > > > > > > >     >     rather makes sure
that we're actually
> testing
> > > > with
> > > > > > the
> > > > > > > > > latest
> > > > > > > > > > > >     > versions. The
> > > > > > > > > > > >     >     PR is available at
[3].
> > > > > > > > > > > >     >
> > > > > > > > > > > >     >     This means that we
would stop verifying
> CUDA
> > 8
> > > > and
> > > > > > > CuDNN
> > > > > > > > > 5.0
> > > > > > > > > > as
> > > > > > > > > > > > part
> > > > > > > > > > > >     > of our
> > > > > > > > > > > >     >     PR process. At a
later point in time, this
> > could
> > > > be
> > > > > > > > picked
> > > > > > > > > up
> > > > > > > > > > > as
> > > > > > > > > > > > a
> > > > > > > > > > > >     >     candidate for an
integration test as part
> of
> > the
> > > > > > > nightly
> > > > > > > > > > suite.
> > > > > > > > > > > >     >
> > > > > > > > > > > >     >     This is a lazy vote,
ending on 17th of
> March,
> > > > 2018
> > > > > at
> > > > > > > > 17:00
> > > > > > > > > > > (UTC
> > > > > > > > > > > > +1).
> > > > > > > > > > > >     >
> > > > > > > > > > > >     >     Best regards,
> > > > > > > > > > > >     >     Marco
> > > > > > > > > > > >     >
> > > > > > > > > > > >     >
> > > > > > > > > > > >     >     [1]:
> > > > > https://issues.apache.org/jira/browse/MXNET-99
> > > > > > > > > > > >     >     [2]: https://github.com/apache/
> > > > > > > incubator-mxnet/pull/9552
> > > > > > > > > > > >     >     [3]: https://github.com/apache/
> > > > > > > > incubator-mxnet/pull/10108
> > > > > > > > > > > >     >
> > > > > > > > > > > >     >
> > > > > > > > > > > >     >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message