mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From TongKe Xue <tk...@tkxue.org>
Subject Re: How does mxnet efficiently free GPU memory ?
Date Thu, 15 Oct 2020 19:33:01 GMT
Hi Qing,

  I think I am understanding something very basic. Perhaps we can work
through this example:

At
https://mxnet.apache.org/versions/1.7/api/java/docs/api/#org.apache.mxnet.javaapi.NDArray
we see a function of signature:

def add(other: NDArray): NDArray

Suppose we have (x: NDArray), (y: NDArray), (z: NDArray), all of the right
dimensions and GPU backed. Furthermore, suppose we do:

out = x * 2.0 + (y * 3.0) + z

My intuition is that this generates temporary values t1, t2, t3 where:

t1 = x * 2.0
t2 = y * 3.0
t3 = t1 + t2
out = t3 + z

However, I am not manually calling dispose on any of t1, t2, t3. Is this
resulting in a memory leak?

--TongKe


On Thu, Oct 15, 2020 at 9:25 AM Qing Lan <lanking520@live.com> wrote:

> Hi Tongke,
>
> GPU memory sometimes go very large and easily crash the GPU memory limit.
> So it require more frequent GC to solve the issue.
>
> MXNet Java designed NDArray to be autoclosable which allow you to get
> memory GC'ed once the usage is done.
>
> MXNet (C API) have a reference counting system established below, but it
> cannot track the JVM object if it holds a piece of memory space. You will
> have to close the JVM object itself which call the Engine that the
> reference is not used to further clean this piece of memory. So the answer
> will be yes, you will need to manually managing the GPU NDArrays if it
> being used.
>
> Thanks,
> Qing
>
> ________________________________
> From: TongKe Xue <tkxue@tkxue.org>
> Sent: Thursday, October 15, 2020 8:29
> To: dev@mxnet.apache.org <dev@mxnet.apache.org>
> Subject: How does mxnet efficiently free GPU memory ?
>
> Hi,
>
>   In my very limited understanding:
>
>   * GPU memory is often a bottleneck to training DL
>   * Java, not being RAII / refcounted, does not have predictable
> destructors
>   * overloading math ops + auto diff often creates transient GPU tensors
> that should later be freed
>
>   Question: does mxnet have any automatic tracking of "this JVM object (1)
> is no longer reachable and (2) holds a GPU tensor, so we should free it" ?
>
> Thanks,
> --TongKe
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message