Heermosi edited a comment on issue #16651: I'm sorry I've triggered an error in mxnet source
code, how can I debug it? It seems like a check failure on custom operators, how can I find
more details?
URL: https://github.com/apache/incubator-mxnet/issues/16651#issuecomment-548300434
> > Can you tell how this happened?
>
> It may be an issue with how storage type is assigned after invoking declare_backward_dependency
callback : https://github.com/apache/incubator-mxnet/blob/master/src/operator/custom/custom.cc#L474
. I haven't had the time recently to dig deeper into this.
Ok, after I've got enough memory on this computer, the problem emerges again.
I'm now commented out all declare_backward_dependency with contents, and no use.
OK,I've deleted all such declarations, and no use
And, can you tell me how to debug it?
By the way, the exception was the same, it looks like this:
> Traceback (most recent call last):
> File "experiments/fpn/fpn_end2end_train_test_RoITransformer.py", line 21, in <module>
> train_end2end_rotbox_RoITransformer.main()
> File "experiments/fpn/../../fpn/train_end2end_rotbox_RoITransformer.py", line 188,
in main
> config.TRAIN.begin_epoch, config.TRAIN.end_epoch, config.TRAIN.lr, config.TRAIN.lr_step)
> File "experiments/fpn/../../fpn/train_end2end_rotbox_RoITransformer.py", line 181,
in train_net
> arg_params=arg_params, aux_params=aux_params, begin_epoch=begin_epoch, num_epoch=end_epoch)
> File "experiments/fpn/../../fpn/core/module.py", line 989, in fit
> self.update_metric(eval_metric, data_batch.label)
> File "experiments/fpn/../../fpn/core/module.py", line 1081, in update_metric
> self._curr_module.update_metric(eval_metric, labels)
> File "experiments/fpn/../../fpn/core/module.py", line 672, in update_metric
> self._exec_group.update_metric(eval_metric, labels)
> File "experiments/fpn/../../fpn/core/DataParallelExecutorGroup.py", line 481, in
update_metric
> eval_metric.update(labels, texec.outputs)
> File "/usr/local/lib/python2.7/site-packages/mxnet-1.6.0-py2.7.egg/mxnet/metric.py",
line 364, in update
> metric.update(labels, preds)
> File "experiments/fpn/../../fpn/core/metric.py", line 53, in update
> pred_label = mx.ndarray.argmax_channel(pred).asnumpy().astype('int32')
> File "/usr/local/lib/python2.7/site-packages/mxnet-1.6.0-py2.7.egg/mxnet/ndarray/ndarray.py",
line 2506, in asnumpy
> ctypes.c_size_t(data.size)))
> File "/usr/local/lib/python2.7/site-packages/mxnet-1.6.0-py2.7.egg/mxnet/base.py",
line 254, in check_call
> raise MXNetError(py_str(_LIB.MXGetLastError()))
> mxnet.base.MXNetError: [18:28:21] src/operator/custom/custom.cc:417: Check failed:
reinterpret_cast<CustomOpFBFunc>(params.info->callbacks[kCustomOpBackward])( ptrs.size(),
const_cast<void**>(ptrs.data()), const_cast<int*>(tags.data()), reinterpret_cast<const
int*>(req.data()), static_cast<int>(ctx.is_train), params.info->contexts[kCustomOpBackward]):
> Stack trace:
> [bt] (0) /usr/local/lib/python2.7/site-packages/mxnet-1.6.0-py2.7.egg/mxnet/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x43)
[0x7f019fb3d133]
> [bt] (1) /usr/local/lib/python2.7/site-packages/mxnet-1.6.0-py2.7.egg/mxnet/libmxnet.so(+0x16d265f)
[0x7f01a01f065f]
> [bt] (2) /usr/local/lib/python2.7/site-packages/mxnet-1.6.0-py2.7.egg/mxnet/libmxnet.so(+0x16db4f9)
[0x7f01a01f94f9]
> [bt] (3) /usr/local/lib/python2.7/site-packages/mxnet-1.6.0-py2.7.egg/mxnet/libmxnet.so(std::thread::_State_impl<std::thread::_Invoker<std::tuple<mxnet::op::custom::CustomOperator::SetNumThreads(int)::{lambda()#1}>
> >::_M_run()+0xde) [0x7f01a020005e]
> [bt] (4) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xbd66f) [0x7f022ca3166f]
> [bt] (5) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f02335316db]
> [bt] (6) /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f0232ab588f]
>
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
|