mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Olivier <cjolivie...@gmail.com>
Subject Re: segmentation fault in master using mkdlnn
Date Thu, 03 May 2018 06:34:09 GMT
you can try Intel Inspector, which is like an enhanced version of valgrind
with a GUI and whatnot.

On Wed, May 2, 2018 at 9:42 PM Da Zheng <zhengda1936@gmail.com> wrote:

> valgrind doesn't work with Python. also, valgrind doesn't support some
> CPU instructions used by MXNet (I think some instructions related to
> random generator).
>
>
> On Wed, May 2, 2018 at 8:59 PM, Bhavin Thaker <bhavinthaker@gmail.com>
> wrote:
> > Have you tried running with valgrind to get some clues on the root-cause?
> >
> > Bhavin Thaker.
> >
> > On Wed, May 2, 2018 at 8:55 PM Da Zheng <zhengda1936@gmail.com> wrote:
> >
> >> It might also be possible that this isn't an MKLDNN bug.
> >> I just saw a similar memory error without MKLDNN build.
> >>
> >>
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-10783/1/pipeline
> >>
> >> Best,
> >> Da
> >>
> >> On Wed, May 2, 2018 at 2:14 PM, Zheng, Da <dzzhen@amazon.com> wrote:
> >> > There might be a race condition that causes the memory error.
> >> > It might be caused by this PR:
> >> > https://github.com/apache/incubator-mxnet/pull/10706/files
> >> > This PR removes MKLDNN memory from NDArray.
> >> > However, I don't know why this causes memory error. If someone is
> using
> >> the memory, it should still hold the memory with shared pointer.
> >> > But I do see the memory error increase after this PR is merged.
> >> >
> >> > Best,
> >> > Da
> >> >
> >> > ´╗┐On 5/2/18, 12:26 PM, "Pedro Larroy" <pedro.larroy.lists@gmail.com>
> >> wrote:
> >> >
> >> >     I couldn't reproduce locally with:
> >> >
> >> >     ci/build.py -p ubuntu_cpu /work/runtime_functions.sh
> >> >     build_ubuntu_cpu_mkldnn && ci/build.py --platform ubuntu_cpu
> >> >     /work/runtime_functions.sh unittest_ubuntu_python2_cpu
> >> >
> >> >
> >> >     On Wed, May 2, 2018 at 8:50 PM, Pedro Larroy <
> >> pedro.larroy.lists@gmail.com>
> >> >     wrote:
> >> >
> >> >     > Hi
> >> >     >
> >> >     > Seems master is not running  anymore, there's a segmentation
> fault
> >> using
> >> >     > MKDLNN-CPU
> >> >     >
> >> >     >
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/
> >> >     > incubator-mxnet/detail/master/801/pipeline/662
> >> >     >
> >> >     >
> >> >     > I see my PRs failing with a similar error.
> >> >     >
> >> >     > Pedro
> >> >     >
> >> >
> >> >
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message