mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leonard Lausen <>
Subject [apache/incubator-mxnet] [RFC] Deferred compute in imperative interface to unify imperative and symbolic interface (#16376)
Date Sat, 05 Oct 2019 00:03:08 GMT
A new **deferred computation** (DC) argument to the imperative MXNet APIs is
proposed. If enabled, memory allocation and computation is deferred as long as
possible. Users can export the computational graph recorded during deferred
computation, which enables hybridization support.

Arrays for which DC is enabled are called **lazy**. Other arrays are called
**normal**. Inplace operations on lazy arrays are unsupported.

Storage allocation and computation for lazy arrays is deferred until their
results are required by conversion to numpy or use as input to an operator
creating a normal array. Accessing attributes such as `shape` can also trigger
computation if the attribute can't be inferred.

## C API

### Deferred Compute (DC) Mode

An “alias” to `MXImperativeInvokeEx`, `MXImperativeDeferredInvokeEx` is
introduced which creates lazy arrays based on (normal or lazy) input arrays and
the operator

``` c
 * \brief invoke a nnvm op and imperative function creating lazy ndarray
 * \param creator the op
 * \param num_inputs number of input NDArrays
 * \param inputs input NDArrays
 * \param num_outputs number of output NDArrays
 * \param outputs output NDArrays
 * \param num_params number of keyword parameters
 * \param param_keys keys for keyword parameters
 * \param param_vals values for keyword parameters
 * \param out_stypes output ndarrays' stypes
 * \return 0 when success, -1 when failure happens
MXNET_DLL int MXImperativeDeferredInvokeEx(AtomicSymbolCreator creator,
                                           int num_inputs,
                                           NDArrayHandle *inputs,
                                           int *num_outputs,
                                           NDArrayHandle **outputs,
                                           int num_params,
                                           const char **param_keys,
                                           const char **param_vals,
                                           const int **out_stypes);

### Checks and explicit trigger

``` c
 * \brief Check if array's computation is deferred.
 * \param handles ndarray handles to be checked
 * \param num_handles nmuber of ndarray handles to be checked
 * \param status pointer to array of num_handles integers to hold the result.
MXNET_DLL int MXNDArrayGetIsDeferredCompute(NDArrayHandle *handles,
                                             int num_handles,
                                             int *status);
 * \brief Trigger deferred computation.
 * \param handles ndarray handles to trigger comuptation of.
 * \param num_handles nmuber of ndarray handles to be checked
 * Deferred computation of input arrays for specified handles is triggered if
 * required. Arrays that are already computed are ignored.
 MXNET_DLL int MXNDArrayTriggerDeferredCompute(NDArrayHandle *handles,
                                              int num_handles);

### Exporting to symbol

The computational graph recorded in deferred computation mode can be exported to
symbol. Users must specify all inputs and outputs, to define the part of the
graph they are interested in exporting.

It is an error, if any of the output depends on an input is not or cannot be
computed from the specified inputs. Equally, providing an input that is not
connected to any output is an error.

``` C
 * \brief Extract the graph constructed during deferred computation mode as a
 * Symbol.
 * \param input_handles ndarray handles of inputs
 * \param output_handles ndarray handles of outputs
 * \param input_names names associated with the inputs of the returned Symbol
 * \param output_names names associated with the outputs of the returned Symbol
 * \param out grouped output symbol handle
 * Construct a Symbol for the subgraph of the deferred computation graph
 * spanning from the input_handles to the output_handles. Requires that
 * input_handles and output_handles are connected in the tracked computational
 * graph. The input_handles are required to have been used as arguments to an
 * operator that is part of the tracked subgraph. All inputs of the
 * computational graph must be specified.
MXNET_DLL int MXNDArrayGetDeferredComputeSymbol(NDArrayHandle *input_handles,
                                                NDArrayHandle *output_handles,
                                                const char** input_names,
                                                const char** output_names,
                                                int num_inputs,
                                                int num_outputs,
                                                SymbolHandle *out);

**Basic Python usage example**
Example without Gluon.

``` python
x =, 10))
with deferred_compute():
    y = (x + 5) * (x + 5)
    z = x**2
s = export(inputs={'x': x}, outputs={'y': y, 'z': z})
assert s.list_inputs() == ['x']
assert s.list_outputs() == ['y', 'z']

## Implementation (C++)

### `NDArray`

``` C++
class NDArray {


   * \brief constructs a new dynamic NDArray
   * \param shape the shape of array
   * \param ctx context of NDArray
   * \param `delay_alloc whether delay the allocation (True for DC mode)`
   * \param dtype data type of this ndarray
  NDArray(const mxnet::TShape &shape, Context ctx,
          bool delay_alloc = false, int dtype = mshadow::default_type_flag)
      : ptr_(std::make_shared<Chunk>(shape, ctx, delay_alloc, dtype)),
        autograd_entry_(nullptr) {

   * \brief Block until all the pending write operations with respect
   *    to current NDArray are finished, and read can be performed.
   * If this is a array with deferred computation, computation is triggered.
  inline void WaitToRead() const;
   * \brief Block until all the pending read/write operations with respect
   *    to current NDArray are finished, and write can be performed.
   * If this is a array with deferred computation, computation is triggered.
  inline void WaitToWrite() const;




  /*! \brief node entry for autograd */
  nnvm::NodeEntry autograd_entry_;  // renamed from entry_
  /*! \brief node entry for deferred computation tracking */
  nnvm::NodeEntry deferredcompute_entry_;
   * \brief Perform deferred computation.
   * Applicable if current array is associated with deferredcompute_entry_ and
   * DCInfo. If so, compute this and all dependent NDArrays.
   * Triggered automatically if needed by WaitToRead
  void DeferredCompute() const;



### `DCInfo`

``` C++
  /*! \brief DCInfo stores NDArays required to perform the deferred computation
   *  of it's owning NDArray.
   *  Once deferred computation is completed, DCInfo::Clear should be executed
   *  to release references to input data.
  class DCInfo {
    DCInfo(std::vector<NDArray> inputs) inputs_(inputs);

    static DCInfo& Get(const nnvm::NodePtr& node) {
      return dmlc::get<DCInfo>(node->info);

    static void Clear(const nnvm::NodePtr& node) {
      if (node == nullptr || node->info.empty()) return;
      DCInfo& info = Get(node);

    static DCInfo& Create(const nnvm::NodePtr& node) {
      return Get(node);

    std::vector<NDArray> inputs_;


### Execution

**Trigger execution from within `NDArray`**

`NDArray::WaitToRead` and `NDArray::WaitToWrite` are extended to trigger
execution, calling `NDArray::TriggerDeferredCompute`. `TriggerDeferredCompute`
is a no-op if no `DCInfo` is associated with the current array, ie. if it is
already computed.

**Explicit C API to trigger execution**
Users can also manually trigger the computation of specified arrays.

``` C
MXNET_DLL int MXNDArrayTriggerDeferredComputation(NDArrayHandle *handles)

Operations on the graph are pushed to the engine for asynchronous execution via RunGraph.

### FAQ

**How about Autograd, `NDArray.autograd_entry_` and `AGInfo`?**
Autograd inside deferred computation (DC) mode can be supported.

Relation of Autograd and DC: While autograd’s `RecordOp` provides a similar
recording functionality to the deferred computation, the autograd graph is not
the same as a computational graph: `NDArray::Detach()` serves to detach a node
from the autograd graph by deleting `NDArray.entry_`, though the `NodeEntry` is
still required for reconstructing the computational history of how this NDArray
came to be.

**Are reqs like `kInPlace` supported?**
No. For now only `kWriteTo` is supported in DC mode.

The plan is to replace inplace operations with `kWriteTo` operations, writing to
a new (lazy) array. The framework should be smart enough to decide when to reuse
memory and when not. It shouldn’t be required for users to specify that they
want an inplace operation.

**How is context attribute handled, specifically context changes?**

Cross-device copy must be represented as operator (`CrossDeviceCopyOp`) which
requires special handling in the graph executor.

**How is incomplete shape information handled?**
`shape` property triggers computation if shape is accessed and can't be inferred completely.
Users can access `static_shape` if they wan't to avoid triggering computation.

## Python (Gluon)

Based on DC, hybridization in Gluon is simplified:

Instead of implementing `def hybrid_forward(self, F, x, ...)` in `HybridBlock`,
users can opt to implement `def forward(self, x, ...)` in `HybridBlock`.

Hybridization based on DC works by the HybridBlock performing the following
steps (if it is not called by a parent block being hybridized)

- keeping a reference to the input arrays and a reference to the parameter
  arrays to pass them to `MXNDArrayGetDeferredComputeSymbol`;
- enabling deferred compute mode
- running `forward`
- exporting to symbol and create CachedOp; Run CachedOp

A (internal) global context variable tracks if hybridization is ongoing. If set
to False and a Block is called that is to be hybridized, the global context
variable is set to True and the Block goes through all 4 steps outlined above;
finally the context variable is set back to False after the *export to Symbol*
step is finished.

**Usage example**

``` python
class Net(nn.HybridBlock):  
    def forward(self, x, ...):

**Hybridizing `gluon.Block`s?**

DC could be used to support hybridzing `Block` if all logic can be traced. A
separate effort may add logic to detect these cases and add hybridization
support based on DC. For now we rely on user to signify hybridization support by
subclassing `HybridBlock`.

### Parameter Shape Inference

For HybridBlock making use of DC for hybridization, we request users to
implement `HybridBlock.infer_shape` to infer the parameters shape given the

Currently, if `HybridBlock.infer_shape` is not implemented, backward shape
inference is used to infer the shape of parameters. However backward shape
inference is not supported in all cases (cf #14253,
and relying on it for parameter shape inference is brittle. Thus for consistency
and simplicity we require `infer_shape` method implementation when using
hybridization based on DC.

You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message