mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dick Carter <dickjc...@apache.org>
Subject Re: CUDA / CUDNN support revisited
Date Mon, 03 Jun 2019 23:20:58 GMT
Actually, I tried to say that support *doesn't necessarily* include N-1.  I'm proposing that
the supported versions are 1) covered by CI and 2) have been available in a usable form long
enough that a semi-motivated user has been able to transition to it.  That might mean only
N (e.g. per my proposal, only cuDNN v7).

Regarding precedent for N / N-1,  when a new CUDA version comes out, users will transition
to it at their own pace, thereby creating a N / N-1 support situation for some period.


On 2019/06/03 22:43:20, Pedro Larroy <pedro.larroy.lists@gmail.com> wrote: 
> Your proposal of having support for N and N-1 makes a lot of sense to
> me. Are there use cases for supporting older CUDA versions?
> 
> 
> Thanks.
> 
> On Mon, Jun 3, 2019 at 3:06 PM Dick Carter <dickjc123@apache.org> wrote:
> >
> > I'd like to revisit the discussion of: https://lists.apache.org/thread.html/27b84e4fc0e0728f2e4ad8b6827d7f996635021a5a4d47b5d3f4dbfb@%3Cdev.mxnet.apache.org%3E
now that a year has passed.
> >
> > My motivation is:
> >
> > 1.  There's a lot of hard-to-read  '#if CUDNN_MAJOR' code referencing cuDNN versions
back as far as v4(!?).  We need to clean this out before it hampers our ability to nimbly
move the codebase forward.
> >
> > 2.  There seems to be a difference of opinion on whether we should be supporting
version 'N-1' (e.g. cuDNN6).  Our current MXNet 1.5 candidate does not compile against cuDNN
v6, so this should be either fixed or be up-front stated to the user community.  The breaking
PR was https://github.com/apache/incubator-mxnet/pull/14476.
> >
> > Having read the prior discussion, my take on it is:
> >
> > - Users should be given an ample time period (1 year?) to move to a new CUDA/cuDNN
version once it becomes 'usable.'
> >
> > - We should not claim to support a given version if it is no longer part of the
MXNet CI.  User's should be warned of an impeding dropping of this 'testing support.'
> >
> > So these statements do not necessarily promise 'N-1' support.  I could see a transitioning
of the CI from CUDA9-only -> CUDA9&10 -> CUDA10 only.  Some period before CUDA9
is dropped from CI, the user community is warned.  After that time, CUDA10 might be the only
version tested by CI, and hence the only version supported (until the next CUDA version came
around).
> >
> > Let me propose as a 'strawman' that we claim to support CUDA version 9 and 10, with
cuDNN version 7 only.  Those versions have been out for over 1.5 years.  So no CUDA 8 or cuDNN
v6 support- over 1.5 years old with no coverage by our CI.
> >
> >     -Dick
> 

Mime
View raw message