mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marco de Abreu <marco.g.ab...@googlemail.com>
Subject Re: [LAZY VOTE][RESULT] Upgrade CI to CUDA 9.1 with CuDNN 7.0
Date Thu, 17 May 2018 13:57:41 GMT
Hello Haibin,

I'd love to see CUDA 8 back in CI, but we're currently lacking people to do
this properly (besides just copy&pasting the job). Since we agreed on only
supporting the last 2 CUDA major versions, we don't have to verify CUDA 7.

The way to go forward is to have things like these in the nightly test
cycle. At the moment, we don't have to manpower to maintain and improve
that suite, so we'll have to wait until we got more people or somebody is
willing to take this on themselves. I'd be happy to support volunteers here!

Best regards,
Marco

On Thu, May 17, 2018 at 7:56 AM, Haibin Lin <haibin.lin.aws@gmail.com>
wrote:

> Is there a plan for adding those CUDA 8 tests back to CI? What about CUDA
> 7?
>
> There were a few build problems in the past few weeks due to lack of CI
> coverage:
> - https://github.com/apache/incubator-mxnet/pull/10710 were found during
> 1.2 rc voting
> - https://github.com/apache/incubator-mxnet/issues/10981 were reported by
> an user with CUDA 7
>
> Having these covered in CI will help catch the issues early. I don't recall
> if we decided to drop CUDA 7 support for MXNet.
>
> Best,
> Haibin
>
> On Wed, Mar 21, 2018 at 6:32 AM, Marco de Abreu <
> marco.g.abreu@googlemail.com> wrote:
>
> > Hello,
> >
> > the migration has just been completed and we're now running our UNIX
> based
> > slaves on CUDA 9.1 with CuDNN 7. The commit is available at
> > https://github.com/apache/incubator-mxnet/commit/
> > b0a6760efa141aeca87b03ecf34dae924bd1af46
> > .
> >
> > No jobs have been interrupted by this migration. If you encounter any
> > errors, please reach back to me.
> >
> > Best regards,
> > Marco
> >
> > On Tue, Mar 20, 2018 at 11:20 PM, Marco de Abreu <
> > marco.g.abreu@googlemail.com> wrote:
> >
> > > Hello,
> > >
> > > the results of this vote are as follows:
> > >
> > > +1:
> > > Jun
> > > Anirudh
> > > Hao
> > > Marco
> > >
> > > 0:
> > > Chris
> > >
> > > -1:
> > > Naveen (veto recalled as of https://lists.apache.org/thread.html/
> > > 242db72a0c96349ef6e0ff1d3b1fe0dc7f7a9082532724c3293666c5@%
> > > 3Cdev.mxnet.apache.org%3E)
> > >
> > > Under the constraint that we will use CUDA 8 on Windows and CUDA 9.1 on
> > > UNIX slaves and work on integration tests for CUDA 8 in the long term,
> > this
> > > vote counts as PASSED.
> > >
> > > The PR for this change is available at https://github.com/apache/
> > > incubator-mxnet/pull/10108. I have developed and tested the new slaves
> in
> > > our test environment and everything looks promising so far. The plan is
> > as
> > > follows:
> > >
> > >    1. Get https://github.com/apache/incubator-mxnet/pull/10108
> approved
> > >    to allow self-merge – CI can’t pass until slaves have been upgraded.
> > >    2. Replace all existing slaves with new upgraded slaves.
> > >    3. Retrigger https://github.com/apache/incubator-mxnet/pull/10108
> to
> > >    merge necessary changes into master.
> > >
> > > IMPORTANT: The migration will happen tomorrow, so please expect some
> > delay
> > > in job execution - the CI website will be unaffected. Ideally, no jobs
> > > should fail - in case they do, please feel free to retrigger them by
> > using
> > > an empty commit. In case of any errors appearing after the upgrade,
> don't
> > > hesitate to contact me!
> > >
> > > Best regards,
> > > Marco
> > >
> > >
> > > On Tue, Mar 20, 2018 at 1:39 AM, Naveen Swamy <mnnaveen@gmail.com>
> > wrote:
> > >
> > >> Yes, for short-term.
> > >>
> > >> On Monday, March 19, 2018, Chris Olivier <cjolivier01@apache.org>
> > wrote:
> > >>
> > >> > In the short ter, Naveen, are you ok with Linux running CUDA 9 and
> > >> Windows
> > >> > CUDA 8 in order to get CUDA version coverage?
> > >> >
> > >> > On 2018/03/16 21:09:09, Marco de Abreu <
> marco.g.abreu@googlemail.com>
> > >> > wrote:
> > >> > > Thanks for your input. How would you propose to proceed in terms
> of
> > a
> > >> > > timeline in case this vote succeedes? I don't really have time
to
> > work
> > >> > on a
> > >> > > nightly setup right now. Would anybody in the community be able
to
> > >> help
> > >> > me
> > >> > > out here or shall we wait with the migration until a nightly
setup
> > for
> > >> > CUDA
> > >> > > 8 is up?
> > >> > >
> > >> > > -Marco
> > >> > >
> > >> > > On Fri, Mar 16, 2018 at 9:55 PM, Bhavin Thaker <
> > >> bhavinthaker@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > > > +1 to the suggestion of testing CUDA8 in few nightly instances
> and
> > >> > using
> > >> > > > CUDA9 for most instances in CI.
> > >> > > >
> > >> > > > Bhavin Thaker.
> > >> > > >
> > >> > > > On Fri, Mar 16, 2018 at 12:37 PM Naveen Swamy <
> mnnaveen@gmail.com
> > >
> > >> > wrote:
> > >> > > >
> > >> > > > > I think its best to add support for CUDA 9.0 while
retaining
> > >> existing
> > >> > > > > support for CUDA 8, code might regress when you remove
and
> > create
> > >> > more
> > >> > > > work
> > >> > > > > to add CUDA 8 support back.
> > >> > > > >
> > >> > > > > On Fri, Mar 16, 2018 at 9:29 AM, Marco de Abreu <
> > >> > > > > marco.g.abreu@googlemail.com> wrote:
> > >> > > > >
> > >> > > > > > Yeah, sorry Chris, mixed up the names.
> > >> > > > > >
> > >> > > > > > @Naveen: Would you be fine with doing the switch
now and
> > adding
> > >> > > > > integration
> > >> > > > > > tests later or is this a hard constraint for you?
> > >> > > > > >
> > >> > > > > > On Wed, Mar 14, 2018 at 6:39 PM, Chris Olivier
<
> > >> > cjolivier01@gmail.com>
> > >> > > > > > wrote:
> > >> > > > > >
> > >> > > > > > > Isn't the TItan V the Volta and not the Tesla?
> > >> > > > > > >
> > >> > > > > > > On Wed, Mar 14, 2018 at 10:36 AM, Naveen
Swamy <
> > >> > mnnaveen@gmail.com>
> > >> > > > > > wrote:
> > >> > > > > > >
> > >> > > > > > > > Marco,
> > >> > > > > > > > My -1 vote is for dropping support to
CUDA 8 and not for
> > >> adding
> > >> > > > CUDA
> > >> > > > > 9.
> > >> > > > > > > > CUDA 9.0 support for MXNet was added
Oct'30-2017, I
> think
> > >> that
> > >> > all
> > >> > > > > > users
> > >> > > > > > > > might not have switched to CUDA 9.0
> > >> > > > > > > >
> > >> > > > > > > > Look at the earlier discussion on the
same topic
> > >> > > > > > > >
> > >> > > > > > > > https://lists.apache.org/thread.html/
> > >> > > > 27b84e4fc0e0728f2e4ad8b6827d7f
> > >> > > > > > > > 996635021a5a4d47b5d3f4dbfb@%3Cdev.mxnet.apache.org%3E
> > >> > > > > > > >
> > >> > > > > > > > On Wed, Mar 14, 2018 at 10:14 AM, Marco
de Abreu <
> > >> > > > > > > > marco.g.abreu@googlemail.com> wrote:
> > >> > > > > > > >
> > >> > > > > > > > > Right, the code changes would not
be validated against
> > >> CUDA
> > >> > 8.0
> > >> > > > as
> > >> > > > > > part
> > >> > > > > > > > of
> > >> > > > > > > > > the PR process.
> > >> > > > > > > > >
> > >> > > > > > > > > I don't have any numbers, but it's
pretty unlikely
> that
> > >> > anybody
> > >> > > > is
> > >> > > > > > > still
> > >> > > > > > > > > using CUDA 8.0. According to
> > >> > > > > > > > > https://en.wikipedia.org/wiki/CUDA#GPUs_supported,
> the
> > >> > devices
> > >> > > > > which
> > >> > > > > > > are
> > >> > > > > > > > > not being supported by CUDA 9 are
under the Fermi
> > >> > architecture
> > >> > > > > which
> > >> > > > > > > has
> > >> > > > > > > > > been released in April 2010. These
GPUs are way too
> old,
> > >> so I
> > >> > > > think
> > >> > > > > > > we're
> > >> > > > > > > > > safe with not covering them specifically
- this does
> not
> > >> mean
> > >> > > > we're
> > >> > > > > > > > > entirely deprecating them.
> > >> > > > > > > > >
> > >> > > > > > > > > One thing to note here is that
we're not testing CUDA
> 9
> > >> as of
> > >> > > > now.
> > >> > > > > > > > > Considering that the Telsa architecture
(Titan V,
> V100)
> > >> > requires
> > >> > > > at
> > >> > > > > > > least
> > >> > > > > > > > > CUDA 9 and those are probably the
most widely used
> GPUs
> > >> for
> > >> > Deep
> > >> > > > > > > > Learning,
> > >> > > > > > > > > we'd probably be covering a wider
user-base in
> > comparison
> > >> to
> > >> > > > CUDA 8
> > >> > > > > > if
> > >> > > > > > > we
> > >> > > > > > > > > make that switch.
> > >> > > > > > > > >
> > >> > > > > > > > > -Marco
> > >> > > > > > > > >
> > >> > > > > > > > > On Wed, Mar 14, 2018 at 5:59 PM,
Naveen Swamy <
> > >> > > > mnnaveen@gmail.com>
> > >> > > > > > > > wrote:
> > >> > > > > > > > >
> > >> > > > > > > > > > Does this mean that MXNet
Users who use CUDA 8.0
> will
> > >> not
> > >> > be
> > >> > > > > > > > > > supported(since you are stopping
to test CUDA 8.0)
> ? I
> > >> > suggest
> > >> > > > we
> > >> > > > > > at
> > >> > > > > > > > > least
> > >> > > > > > > > > > have nightly tests for CUDA
8.0.
> > >> > > > > > > > > >
> > >> > > > > > > > > > Do you have a sense of how
many users are using CUDA
> > >> > 8.0/9.0 ?
> > >> > > > > > > > > >
> > >> > > > > > > > > > -1
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > > > On Wed, Mar 14, 2018 at 9:50
AM, Chris Olivier <
> > >> > > > > > > cjolivier01@gmail.com>
> > >> > > > > > > > > > wrote:
> > >> > > > > > > > > >
> > >> > > > > > > > > > > +0
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > On Wed, Mar 14, 2018
at 9:45 AM, Jin, Hao <
> > >> > hjjn@amazon.com>
> > >> > > > > > wrote:
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > > +1
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > On 3/14/18, 9:04
AM, "Anirudh" <
> > >> anirudh2290@gmail.com
> > >> > >
> > >> > > > > wrote:
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >     +1
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >     On Mar 14, 2018
8:56 AM, "Wu, Jun" <
> > >> > jwum@amazon.com>
> > >> > > > > > wrote:
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >     > +1
> > >> > > > > > > > > > > >     >
> > >> > > > > > > > > > > >     > On 3/14/18,
8:52 AM, "Marco de Abreu" <
> > >> > > > > > > > > > > marco.g.abreu@googlemail.com>
> > >> > > > > > > > > > > >     > wrote:
> > >> > > > > > > > > > > >     >
> > >> > > > > > > > > > > >     >     Hello,
> > >> > > > > > > > > > > >     >
> > >> > > > > > > > > > > >     >     this
is a vote to upgrade our CI
> > >> environment
> > >> > from
> > >> > > > > the
> > >> > > > > > > > > current
> > >> > > > > > > > > > > > CUDA 8.0
> > >> > > > > > > > > > > >     > with
> > >> > > > > > > > > > > >     >     CuDNN
5.0 to CUDA 9.1 with CuDNN 7.0.
> > >> Reason
> > >> > > > being
> > >> > > > > > that
> > >> > > > > > > > > NVCC
> > >> > > > > > > > > > > > under
> > >> > > > > > > > > > > >     > CUDA 8
> > >> > > > > > > > > > > >     >     does
not support the Volta GPUs used
> in
> > >> AWS
> > >> > P3
> > >> > > > > > > instances
> > >> > > > > > > > > and
> > >> > > > > > > > > > > thus
> > >> > > > > > > > > > > >     > limiting
> > >> > > > > > > > > > > >     >     our
test capabilities. More details
> are
> > >> > available
> > >> > > > > at
> > >> > > > > > > [1].
> > >> > > > > > > > > > > >     >
> > >> > > > > > > > > > > >     >     In
order to introduce support for
> > >> > Quantization
> > >> > > > [1],
> > >> > > > > > I'd
> > >> > > > > > > > > like
> > >> > > > > > > > > > to
> > >> > > > > > > > > > > >     > perform
a
> > >> > > > > > > > > > > >     >     system-wide
upgrade. This should have
> no
> > >> > negative
> > >> > > > > > > impact
> > >> > > > > > > > in
> > >> > > > > > > > > > our
> > >> > > > > > > > > > > > users
> > >> > > > > > > > > > > >     > but
> > >> > > > > > > > > > > >     >     rather
makes sure that we're actually
> > >> testing
> > >> > > > with
> > >> > > > > > the
> > >> > > > > > > > > latest
> > >> > > > > > > > > > > >     > versions.
The
> > >> > > > > > > > > > > >     >     PR
is available at [3].
> > >> > > > > > > > > > > >     >
> > >> > > > > > > > > > > >     >     This
means that we would stop
> verifying
> > >> CUDA
> > >> > 8
> > >> > > > and
> > >> > > > > > > CuDNN
> > >> > > > > > > > > 5.0
> > >> > > > > > > > > > as
> > >> > > > > > > > > > > > part
> > >> > > > > > > > > > > >     > of our
> > >> > > > > > > > > > > >     >     PR
process. At a later point in time,
> > this
> > >> > could
> > >> > > > be
> > >> > > > > > > > picked
> > >> > > > > > > > > up
> > >> > > > > > > > > > > as
> > >> > > > > > > > > > > > a
> > >> > > > > > > > > > > >     >     candidate
for an integration test as
> > part
> > >> of
> > >> > the
> > >> > > > > > > nightly
> > >> > > > > > > > > > suite.
> > >> > > > > > > > > > > >     >
> > >> > > > > > > > > > > >     >     This
is a lazy vote, ending on 17th of
> > >> March,
> > >> > > > 2018
> > >> > > > > at
> > >> > > > > > > > 17:00
> > >> > > > > > > > > > > (UTC
> > >> > > > > > > > > > > > +1).
> > >> > > > > > > > > > > >     >
> > >> > > > > > > > > > > >     >     Best
regards,
> > >> > > > > > > > > > > >     >     Marco
> > >> > > > > > > > > > > >     >
> > >> > > > > > > > > > > >     >
> > >> > > > > > > > > > > >     >     [1]:
> > >> > > > > https://issues.apache.org/jira/browse/MXNET-99
> > >> > > > > > > > > > > >     >     [2]:
https://github.com/apache/
> > >> > > > > > > incubator-mxnet/pull/9552
> > >> > > > > > > > > > > >     >     [3]:
https://github.com/apache/
> > >> > > > > > > > incubator-mxnet/pull/10108
> > >> > > > > > > > > > > >     >
> > >> > > > > > > > > > > >     >
> > >> > > > > > > > > > > >     >
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message