mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Olivier <cjolivie...@gmail.com>
Subject Re: Reduce 99% of your memory leaks with this simple trick!
Date Fri, 12 Jan 2018 03:35:57 GMT
I think there's tons of books on "best practices" already, so I wouldn't
want to trouble you :)

Are we running coverity static analysis?  It catches those kinds of things.

On Thu, Jan 11, 2018 at 7:04 PM, Bhavin Thaker <bhavinthaker@gmail.com>
wrote:

> Would it make sense to have a developer best practices section on the
> Apache wiki where such guidance can be documented for future reference?
>
> Bhavin Thaker.
>
> On Thu, Jan 11, 2018 at 9:56 AM Anirudh <anirudh2290@gmail.com> wrote:
>
> > Hi,
> >
> >
> > I have been thinking about exception handling specifically inside spawned
> > threads.
> >
> > As Tianqi mentioned, there is already a mechanism with LOG(FATAL) or
> CHECK
> > for exception handling inside main
> >
> > Thread. For exception handling inside spawned threads I see two places:
> > iterators and operators.
> >
> >
> >
> > For iterators, we can use exception_ptr to transport the exceptions from
> > child thread to the main thread.
> >
> > This can be implemented in the threadediter class template. Since
> > PrefetchingIter is used by most iterators in MXNet,
> >
> > and this uses threadediter, we should be able to cover most of our use
> > cases.
> >
> >
> >
> > For operators, I was thinking that we can transport the exception down
> the
> > dependency path.
> >
> > For example, when an exception is caught inside ExecuteOprBlock for a
> > single operator,
> >
> > We store the exception_ptr in the operator. We then propagate the
> > exception_ptr down to all the vars that the
> >
> > Operator writes to. Similarly, if an operator’s read vars has
> exception_ptr
> > attached to it, we propagate it down to the operator itself.
> >
> >
> >
> > We can then check if the var has an associated exception_ptr in
> > wait_to_read.
> >
> > One problem I see with the approach is that even if an operator fails we
> > may need to run subsequent operators. One way to avoid this
> >
> > Would be an onstart callback, which would mark the operator to not
> execute
> > if any of the read vars have an exception_ptr attached to it.
> >
> >
> >
> > Anirudh
> >
> > On Thu, Jan 11, 2018 at 9:02 AM, Tianqi Chen <tqchen@cs.washington.edu>
> > wrote:
> >
> > > I am all for RAII when possible in most of the code. The only reason
> some
> > > of the raw ptr occur in dmlc codebase was legacy-issue, and that can be
> > > resolved by wrapping returning ptr via unique_ptr or shared_ptr. One
> > > notable property of RAII is exception safe, which makes the code handle
> > > resource correctly when it throws in the middle. There are cases where
> > > memory allocation needs to be explicitly handled(e.g. GPU memory
> > > management) and reused where we need to do explicit management when
> > needed.
> > >
> > >
> > > As for exception handling, we do have a mechanism for handling
> > exceptions.
> > > when you do LOG(FATAL) or CHECK is caught at the C API boundary, which
> > > translates to return code  -1 and an error is thrown on the python
> side.
> > > Throwing exception from another thread is a more tricky thing, which
> > > involves catching them in the engine, and usually, the state is not
> > correct
> > > in such case. But most of the cases when we need exception handling are
> > the
> > > simple case of opening a file and use CHECK should suffice.
> > >
> > > A better approach might be defining a new macro for errors intended to
> > > throw to a user and handled correctly, something like DMLC_EXPECT. But
> I
> > > find it might be a burden to put developer to distinguish what should
> be
> > a
> > > user error and a normal check, so we just use CHECK for now
> > >
> > > Tianqi
> > >
> > > On Thu, Jan 11, 2018 at 3:09 AM, Pedro Larroy <
> > > pedro.larroy.lists@gmail.com>
> > > wrote:
> > >
> > > > Hi
> > > >
> > > > I would like to encourage contributors to use RAII idioms in C++
> > > > whenever possible to avoid resource leaks.
> > > >
> > > > RAII is an ugly acronym that stands for Resource Acquisition Is
> > > > Initialization, which basically means that you should almost never
> use
> > > > explicit new and delete operators and instead use std::make_shared,
> > > > std::make_unique and std::vector<uint8_t>  and .data() for raw
> > > > buffers. Also always allocating OS resources in constructors
> releasing
> > > > them in destructors such as file descriptors.
> > > >
> > > > Asides from forgetting to call delete on an allocation, explicit
> > > > deletes are bad because an exception thrown in the middle prevents
> > > > delete from running entirely.
> > > >
> > > > This helps a lot writing correct, secure and exception safe code
> > > > without memory leaks.
> > > >
> > > > Another problem that I think is worth a discussion, is how to handle
> > > > exceptions and errors. Right now, I don't think there's a good way to
> > > > throw an exception in some functions without crashing the python
> > > > interpreter. I think we should come with a smart way to propagate
> > > > exceptions from the library up to the user runtime (python, scala...)
> > > >
> > > > As an example of what I'm talking about is this suspicious code that
> I
> > > > saw in a PR, which has several bugs in a few lines of code related to
> > > > what I'm discussing in this thread, crashing Python when trying to
> > > > open a file that doesn't exist. (How to propagate an exception in
> this
> > > > case?)
> > > >
> > > > https://github.com/apache/incubator-mxnet/pull/9370/files
> > > >
> > > > Please excuse the clickbait subject, just trying to grab your
> > > > attention in a humorous way now that the weekend is approaching.
> > > >
> > > > Pedro.
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message