mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bhavin Thaker <bhavintha...@gmail.com>
Subject Re: Reduce 99% of your memory leaks with this simple trick!
Date Fri, 12 Jan 2018 03:04:18 GMT
Would it make sense to have a developer best practices section on the
Apache wiki where such guidance can be documented for future reference?

Bhavin Thaker.

On Thu, Jan 11, 2018 at 9:56 AM Anirudh <anirudh2290@gmail.com> wrote:

> Hi,
>
>
> I have been thinking about exception handling specifically inside spawned
> threads.
>
> As Tianqi mentioned, there is already a mechanism with LOG(FATAL) or CHECK
> for exception handling inside main
>
> Thread. For exception handling inside spawned threads I see two places:
> iterators and operators.
>
>
>
> For iterators, we can use exception_ptr to transport the exceptions from
> child thread to the main thread.
>
> This can be implemented in the threadediter class template. Since
> PrefetchingIter is used by most iterators in MXNet,
>
> and this uses threadediter, we should be able to cover most of our use
> cases.
>
>
>
> For operators, I was thinking that we can transport the exception down the
> dependency path.
>
> For example, when an exception is caught inside ExecuteOprBlock for a
> single operator,
>
> We store the exception_ptr in the operator. We then propagate the
> exception_ptr down to all the vars that the
>
> Operator writes to. Similarly, if an operator’s read vars has exception_ptr
> attached to it, we propagate it down to the operator itself.
>
>
>
> We can then check if the var has an associated exception_ptr in
> wait_to_read.
>
> One problem I see with the approach is that even if an operator fails we
> may need to run subsequent operators. One way to avoid this
>
> Would be an onstart callback, which would mark the operator to not execute
> if any of the read vars have an exception_ptr attached to it.
>
>
>
> Anirudh
>
> On Thu, Jan 11, 2018 at 9:02 AM, Tianqi Chen <tqchen@cs.washington.edu>
> wrote:
>
> > I am all for RAII when possible in most of the code. The only reason some
> > of the raw ptr occur in dmlc codebase was legacy-issue, and that can be
> > resolved by wrapping returning ptr via unique_ptr or shared_ptr. One
> > notable property of RAII is exception safe, which makes the code handle
> > resource correctly when it throws in the middle. There are cases where
> > memory allocation needs to be explicitly handled(e.g. GPU memory
> > management) and reused where we need to do explicit management when
> needed.
> >
> >
> > As for exception handling, we do have a mechanism for handling
> exceptions.
> > when you do LOG(FATAL) or CHECK is caught at the C API boundary, which
> > translates to return code  -1 and an error is thrown on the python side.
> > Throwing exception from another thread is a more tricky thing, which
> > involves catching them in the engine, and usually, the state is not
> correct
> > in such case. But most of the cases when we need exception handling are
> the
> > simple case of opening a file and use CHECK should suffice.
> >
> > A better approach might be defining a new macro for errors intended to
> > throw to a user and handled correctly, something like DMLC_EXPECT. But I
> > find it might be a burden to put developer to distinguish what should be
> a
> > user error and a normal check, so we just use CHECK for now
> >
> > Tianqi
> >
> > On Thu, Jan 11, 2018 at 3:09 AM, Pedro Larroy <
> > pedro.larroy.lists@gmail.com>
> > wrote:
> >
> > > Hi
> > >
> > > I would like to encourage contributors to use RAII idioms in C++
> > > whenever possible to avoid resource leaks.
> > >
> > > RAII is an ugly acronym that stands for Resource Acquisition Is
> > > Initialization, which basically means that you should almost never use
> > > explicit new and delete operators and instead use std::make_shared,
> > > std::make_unique and std::vector<uint8_t>  and .data() for raw
> > > buffers. Also always allocating OS resources in constructors releasing
> > > them in destructors such as file descriptors.
> > >
> > > Asides from forgetting to call delete on an allocation, explicit
> > > deletes are bad because an exception thrown in the middle prevents
> > > delete from running entirely.
> > >
> > > This helps a lot writing correct, secure and exception safe code
> > > without memory leaks.
> > >
> > > Another problem that I think is worth a discussion, is how to handle
> > > exceptions and errors. Right now, I don't think there's a good way to
> > > throw an exception in some functions without crashing the python
> > > interpreter. I think we should come with a smart way to propagate
> > > exceptions from the library up to the user runtime (python, scala...)
> > >
> > > As an example of what I'm talking about is this suspicious code that I
> > > saw in a PR, which has several bugs in a few lines of code related to
> > > what I'm discussing in this thread, crashing Python when trying to
> > > open a file that doesn't exist. (How to propagate an exception in this
> > > case?)
> > >
> > > https://github.com/apache/incubator-mxnet/pull/9370/files
> > >
> > > Please excuse the clickbait subject, just trying to grab your
> > > attention in a humorous way now that the weekend is approaching.
> > >
> > > Pedro.
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message