mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From reminisce <>
Subject Re: [apache/incubator-mxnet] [RFC][mxnet 2.0][item 10.1] MXNet Imperative Op Invocation Overhead (#17097)
Date Sun, 22 Dec 2019 05:21:26 GMT
@ptrendx Yes, there is an effort of profiling engine code flow using VTune. We hope the exercise
can pinpoint the hotspots that contribute to the most part of latency. Further time split
for pure C++ part between setup code (shape/type inference, memory allocation, dependency
setup) and op scheduling is also around 50% vs. 50%.

For the "fast path" data structures, I'm summarizing the items as follows (including the ones
suggested by @sxjscience):

- `tuple` and `list` since they can be interchangeable in NumPy semantics to represent shapes
and axes.
- `str` because einsum has this parameter and the op can be intensively used in transformer
- `py_slice`, `Ellipsis`, `None` for basic indexing. We can do one step further by moving
the whole indexing dispatch logic to backend.
- np scalars.
- `mx.context.Context`. One call of `mx.cpu()` can be as large as 600ns using ctypes. One
thought is do it in the pybind way by creating a Python binding for the backend `Context`
- `np.dtype`. Similar to `Context`.

You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message