pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: PDFMergerUtility causes an OutOfMemory exception when merging a large number of single page PDF documents
Date Mon, 19 Sep 2016 15:42:15 GMT
Am 19.09.2016 um 17:11 schrieb Hamann, Daniel:
> Hi,
>
>   
>
> Apache PDFBox 1.8.1 PDFMergerUtility causes an OutOfMemory exception
> when merging a large number of single page PDF documents:

That version is several years old...

>
>   
>
> Here is the stacktrace:
>
>   
>
> Caused by: java.lang.OutOfMemoryError: Java heap space
>
>         at java.util.Arrays.copyOf(Arrays.java:3210)
>
>         at java.util.Arrays.copyOf(Arrays.java:3181)
>
>         at java.util.ArrayList.grow(ArrayList.java:261)
>
>         at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:235)
>
>         at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:227)
>
>         at java.util.ArrayList.add(ArrayList.java:458)
>
>         at
> org.apache.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:217)
>
>         at
> org.apache.pdfbox.pdmodel.PDPageNode.getKids(PDPageNode.java:174)
>
>         at
> org.apache.pdfbox.pdmodel.PDDocument.addPage(PDDocument.java:278)
>
>         at
> org.apache.pdfbox.util.PDFMergerUtility.appendDocument(PDFMergerUtility.
> java:528)
>
>         at
> org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(PDFMergerUtility.
> java:242)
>
>         at
> org.apache.pdfbox.util.PDFMergerUtility.mergeDocumentsNonSeq(PDFMergerUt
> ility.java:211)
>
>   
>
>   
>
> Even when using mergeDocumentsNonSeq() - which means an external file is
> used to store data that is read from source PDF documents temporarily -
> memory is consumed during appending source documents to resulting merged
> PDF.
>
>   
>
> My questions are:
>
>   
>
> 1.            Is memory consumed because appendDocument() reads PDF
> document information from temporary file back to memory...

>
> 2.            ...or is memory consumed because data structures are built
> up in memory just to hold references to PDF document information in
> temporary file (...which in turn is only read during streaming merged
> document to file)?


>
> 3.            Can I expect version 2.0.3 to handle merging of PDFs
> differently?
>
>   
>
> I checked the code of PDFMergerUtility in version 2.0.3 and I am aware
> of the new "MemoryUsageSetting" method parameter. As far as I understand
> method PDFMergerUtility.appendDocument() there is no significant
> difference between version 1.8.10 and version 2.0.3.

The difference is under the hood, the memory management was changed 
between 1.8 and 2.0. So I'd suggest you just try.

Tilman


>
>   
>
> Reading the code of PDFMergerUtility merging PDF documents seems to be
> an extremely "expensive" process. I wonder if there really isn't a way
> to do this using less memory...
>
>   
>
> Any answer would be greatly appreciated!
>
>   
>
> Thanks a lot,
>
>   
>
> Daniel
>
>   
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message