pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tilman Hausherr (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PDFBOX-4396) Memory leak due to soft reference caching
Date Thu, 06 Dec 2018 07:38:00 GMT

    [ https://issues.apache.org/jira/browse/PDFBOX-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711074#comment-16711074
] 

Tilman Hausherr commented on PDFBOX-4396:
-----------------------------------------

The resource cache is not to be shared across documents. The key is COSObject, i.e. an indirect
object number. In a PDF file you see these as "10 0 R", and the objects as "10 0 obj".

I don't know what internal comment you mean (you didn't quote it), but there is a weakness
somewhere that scratch file buffers are not closed properly and this is done in finalization.
There is a JIRA issue on this, e.g. PDFBOX-3388 and PDFBOX-3359.

If your problem gets solved by calling gc yourself then it means java is to blame because
it should do a gc by itself when memory is too low to allocate new objects.

If you can reproduce a scenario that eats up available memory then please share the PDF and
the code.

> Memory leak due to soft reference caching
> -----------------------------------------
>
>                 Key: PDFBOX-4396
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4396
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 2.0.12
>         Environment: JDK10; G1
>            Reporter: Ben Manes
>            Priority: Major
>         Attachments: #2 - memory leak 2.png, #2 - memory leak.png, memory leak 2.png,
memory leak.png
>
>
> In a heap dump, it appears that DefaultResourceCache is retaining 5.3 GB of memory due
to buffered images (via PDImageXObject). I suspect that G1 is not collecting soft references
across all regions before it out-of-memory errors.
> In PDFBOX-4389, I discovered very slow PDDocument#load times due to a JDK10 I/O bug.
Previously I was loading the document to render each page, but this took 1.5 minutes. To work
around that bug I reused the document instance across pages. This seems to have fail because
the pages were cached and not cleared by the GC.
> The DefaultResourceCache does not prune its cache entries when the soft references are
collected. Like WeakHashMap, it should use a ReferenceQueue, poll it on every access, and
prune accordingly.
> Thankfully PDDocument#setResourceCache exists. For now I am going to reset the cache
to a new instance after a page has been rendered. The entries should no longer be reachable
and be GC'd more aggressively. If that doesn't work, I'll either replace the cache (e.g. with
Caffeine) or disable it by setting the instance to null.
> I think the desired fix is to prune the DefaultResourceCache and, ideally, reconsider
usage of soft references (as they tend to be poor in practice). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Mime
View raw message