mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anirudh <anirudh2...@gmail.com>
Subject Re: Reduce 99% of your memory leaks with this simple trick!
Date Thu, 11 Jan 2018 17:56:02 GMT
Hi,


I have been thinking about exception handling specifically inside spawned
threads.

As Tianqi mentioned, there is already a mechanism with LOG(FATAL) or CHECK
for exception handling inside main

Thread. For exception handling inside spawned threads I see two places:
iterators and operators.



For iterators, we can use exception_ptr to transport the exceptions from
child thread to the main thread.

This can be implemented in the threadediter class template. Since
PrefetchingIter is used by most iterators in MXNet,

and this uses threadediter, we should be able to cover most of our use
cases.



For operators, I was thinking that we can transport the exception down the
dependency path.

For example, when an exception is caught inside ExecuteOprBlock for a
single operator,

We store the exception_ptr in the operator. We then propagate the
exception_ptr down to all the vars that the

Operator writes to. Similarly, if an operator’s read vars has exception_ptr
attached to it, we propagate it down to the operator itself.



We can then check if the var has an associated exception_ptr in
wait_to_read.

One problem I see with the approach is that even if an operator fails we
may need to run subsequent operators. One way to avoid this

Would be an onstart callback, which would mark the operator to not execute
if any of the read vars have an exception_ptr attached to it.



Anirudh

On Thu, Jan 11, 2018 at 9:02 AM, Tianqi Chen <tqchen@cs.washington.edu>
wrote:

> I am all for RAII when possible in most of the code. The only reason some
> of the raw ptr occur in dmlc codebase was legacy-issue, and that can be
> resolved by wrapping returning ptr via unique_ptr or shared_ptr. One
> notable property of RAII is exception safe, which makes the code handle
> resource correctly when it throws in the middle. There are cases where
> memory allocation needs to be explicitly handled(e.g. GPU memory
> management) and reused where we need to do explicit management when needed.
>
>
> As for exception handling, we do have a mechanism for handling exceptions.
> when you do LOG(FATAL) or CHECK is caught at the C API boundary, which
> translates to return code  -1 and an error is thrown on the python side.
> Throwing exception from another thread is a more tricky thing, which
> involves catching them in the engine, and usually, the state is not correct
> in such case. But most of the cases when we need exception handling are the
> simple case of opening a file and use CHECK should suffice.
>
> A better approach might be defining a new macro for errors intended to
> throw to a user and handled correctly, something like DMLC_EXPECT. But I
> find it might be a burden to put developer to distinguish what should be a
> user error and a normal check, so we just use CHECK for now
>
> Tianqi
>
> On Thu, Jan 11, 2018 at 3:09 AM, Pedro Larroy <
> pedro.larroy.lists@gmail.com>
> wrote:
>
> > Hi
> >
> > I would like to encourage contributors to use RAII idioms in C++
> > whenever possible to avoid resource leaks.
> >
> > RAII is an ugly acronym that stands for Resource Acquisition Is
> > Initialization, which basically means that you should almost never use
> > explicit new and delete operators and instead use std::make_shared,
> > std::make_unique and std::vector<uint8_t>  and .data() for raw
> > buffers. Also always allocating OS resources in constructors releasing
> > them in destructors such as file descriptors.
> >
> > Asides from forgetting to call delete on an allocation, explicit
> > deletes are bad because an exception thrown in the middle prevents
> > delete from running entirely.
> >
> > This helps a lot writing correct, secure and exception safe code
> > without memory leaks.
> >
> > Another problem that I think is worth a discussion, is how to handle
> > exceptions and errors. Right now, I don't think there's a good way to
> > throw an exception in some functions without crashing the python
> > interpreter. I think we should come with a smart way to propagate
> > exceptions from the library up to the user runtime (python, scala...)
> >
> > As an example of what I'm talking about is this suspicious code that I
> > saw in a PR, which has several bugs in a few lines of code related to
> > what I'm discussing in this thread, crashing Python when trying to
> > open a file that doesn't exist. (How to propagate an exception in this
> > case?)
> >
> > https://github.com/apache/incubator-mxnet/pull/9370/files
> >
> > Please excuse the clickbait subject, just trying to grab your
> > attention in a humorous way now that the weekend is approaching.
> >
> > Pedro.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message