mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pedro Larroy <>
Subject cuda CUDNN auto tune, optimal parameters of cuda kernels
Date Wed, 24 Jan 2018 16:39:46 GMT

We have identified that cuda cudnn autotune produces a significant
spike of ram usage when finding the best convolution algorithm.

As far as we understand this is inside the cudnn library. But in
platforms like the TX1 where we only have 4G this is problematic as
the spike is close to 4G.

auto tune can be disabled with an environment variable, but for these
platforms might be interesting to save these kind of parameters once
and not have them run every time at runtime, otherwise you are
probably doing convolutions with slower kernels.

The second topic I wanted to bring up is, would it be a good idea to
have configurable kernel launch parameters to optimize SM resource

Either via maybe a compile time approach based on the target arch:

Or based on a runtime profile.

Any thoughts on these topics?


View raw message