mxnet-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [incubator-mxnet] Heermosi edited a comment on issue #16651: I'm sorry I've triggered an error in mxnet source code, how can I debug it? It seems like a check failure on custom operators, how can I find more details?
Date Fri, 01 Nov 2019 05:51:07 GMT
Heermosi edited a comment on issue #16651: I'm sorry I've triggered an error in mxnet source
code, how can I debug it? It seems like a check failure on custom operators, how can I find
more details?
URL: https://github.com/apache/incubator-mxnet/issues/16651#issuecomment-548300434
 
 
   > > Can you tell how this happened?
   > 
   > It may be an issue with how storage type is assigned after invoking declare_backward_dependency
callback : https://github.com/apache/incubator-mxnet/blob/master/src/operator/custom/custom.cc#L474
. I haven't had the time recently to dig deeper into this.
   
   Ok, after I've got enough memory on this computer, the problem emerges again.
   I'm now commented out all declare_backward_dependency with contents, and no use.
   OK,I've deleted all such declarations, and no use
   And, can you tell me how to debug it?
   By the way, the exception was the same, it looks like this:
   
   > Traceback (most recent call last):
   >   File "experiments/fpn/fpn_end2end_train_test_RoITransformer.py", line 21, in <module>
   >     train_end2end_rotbox_RoITransformer.main()
   >   File "experiments/fpn/../../fpn/train_end2end_rotbox_RoITransformer.py", line 188,
in main
   >     config.TRAIN.begin_epoch, config.TRAIN.end_epoch, config.TRAIN.lr, config.TRAIN.lr_step)
   >   File "experiments/fpn/../../fpn/train_end2end_rotbox_RoITransformer.py", line 181,
in train_net
   >     arg_params=arg_params, aux_params=aux_params, begin_epoch=begin_epoch, num_epoch=end_epoch)
   >   File "experiments/fpn/../../fpn/core/module.py", line 989, in fit
   >     self.update_metric(eval_metric, data_batch.label)
   >   File "experiments/fpn/../../fpn/core/module.py", line 1081, in update_metric
   >     self._curr_module.update_metric(eval_metric, labels)
   >   File "experiments/fpn/../../fpn/core/module.py", line 672, in update_metric
   >     self._exec_group.update_metric(eval_metric, labels)
   >   File "experiments/fpn/../../fpn/core/DataParallelExecutorGroup.py", line 481, in
update_metric
   >     eval_metric.update(labels, texec.outputs)
   >   File "/usr/local/lib/python2.7/site-packages/mxnet-1.6.0-py2.7.egg/mxnet/metric.py",
line 364, in update
   >     metric.update(labels, preds)
   >   File "experiments/fpn/../../fpn/core/metric.py", line 53, in update
   >     pred_label = mx.ndarray.argmax_channel(pred).asnumpy().astype('int32')
   >   File "/usr/local/lib/python2.7/site-packages/mxnet-1.6.0-py2.7.egg/mxnet/ndarray/ndarray.py",
line 2506, in asnumpy
   >     ctypes.c_size_t(data.size)))
   >   File "/usr/local/lib/python2.7/site-packages/mxnet-1.6.0-py2.7.egg/mxnet/base.py",
line 254, in check_call
   >     raise MXNetError(py_str(_LIB.MXGetLastError()))
   > mxnet.base.MXNetError: [18:28:21] src/operator/custom/custom.cc:417: Check failed:
reinterpret_cast<CustomOpFBFunc>(params.info->callbacks[kCustomOpBackward])( ptrs.size(),
const_cast<void**>(ptrs.data()), const_cast<int*>(tags.data()), reinterpret_cast<const
int*>(req.data()), static_cast<int>(ctx.is_train), params.info->contexts[kCustomOpBackward]):
   > Stack trace:
   >   [bt] (0) /usr/local/lib/python2.7/site-packages/mxnet-1.6.0-py2.7.egg/mxnet/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x43)
[0x7f019fb3d133]
   >   [bt] (1) /usr/local/lib/python2.7/site-packages/mxnet-1.6.0-py2.7.egg/mxnet/libmxnet.so(+0x16d265f)
[0x7f01a01f065f]
   >   [bt] (2) /usr/local/lib/python2.7/site-packages/mxnet-1.6.0-py2.7.egg/mxnet/libmxnet.so(+0x16db4f9)
[0x7f01a01f94f9]
   >   [bt] (3) /usr/local/lib/python2.7/site-packages/mxnet-1.6.0-py2.7.egg/mxnet/libmxnet.so(std::thread::_State_impl<std::thread::_Invoker<std::tuple<mxnet::op::custom::CustomOperator::SetNumThreads(int)::{lambda()#1}>
> >::_M_run()+0xde) [0x7f01a020005e]
   >   [bt] (4) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xbd66f) [0x7f022ca3166f]
   >   [bt] (5) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f02335316db]
   >   [bt] (6) /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f0232ab588f]
   > 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message