mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Naveen Swamy <>
Subject Re: Should MXNet 1.3 contain a buggy version of nn.Embedding backward by default?
Date Tue, 24 Jul 2018 04:12:18 GMT
If it is buggy, how does it matter if it is performant or not? I am not
seeing the rationale to make the correct version only opt-in.

On Mon, Jul 23, 2018 at 6:47 PM, Leonard Lausen <>

> Currently the default kernel of nn.Embedding backward is known to be
> buggy on P3 instances or using Cuda 9.2 (though the issue also occurs on
> other instances with earlier version of Cuda, but less often).
> There is currently an opt-in for using a bug-free kernel, but it is not
> the default. However, the bug-free kernel is used by default for shape
> smaller 16384.
> Should MXNet ship a more efficient but buggy kernel in v1.3 or use a
> correct but less efficient kernel by default? As MXNet v1.3 is likely to
> be used a lot with Cuda 9.2 I believe the default behavior should be
> changed to use the bug-free but less efficient Kernel. Correctness and
> providing a good user experience should be No. 1 here (?). Then users
> that want a faster but buggy backward kernel can still select to do so.
> Note this only affects the backward pass.
> Hao did related work on improving the take operator
> which also fixes
> the issue, but he found it to be only "slightly faster" compared to the
> bug-free kernel that is currently under opt-in while leading to CI
> failures on Windows.
> In my experience, there is no speed difference between the current buggy
> and
> opt-in bug-free kernel, but the GPU utilization of the latter is 100%
> compared
> to 60% of the former (benchmark script:
> issuecomment-405808567 )

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message