pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thorsten Schöning <tschoen...@am-soft.de>
Subject Large memory footprint and long processing time for one page PDF
Date Wed, 08 Nov 2017 11:38:18 GMT
Hi all,

I'm seeing a strange printing behaviour using Apache PDFBox and a PDF
containing only one page. When printing a completely different PDF
containing a lot more pages and text I don't see that behaviour.

The problem with that one special PDF is that I'm not allowed to share
it publicly, so I would like to 1. know if you think this is a problem
worth looking at and 2. if someone is able to receive my PDF and
handle it reasonable private, like has been suggested for other bugs
already[1]. I don't need some NDA or such, the file should just be
deleted after it's most likely not needed anymore. The content is not
even that sensitive to be afraid of.

The problem is that printing the file using PDFBox 2.0.3 results in
the Java process consuming around 3 GB of memory and processing time
is around 55 seconds. Using the newest PDFBox 2.0.8 instead, memory
consumption drops a bit to around 2,7 GB and processing time is around
35 seconds. Printing other PDFs with e.g. 10 pages of text processing
time is around 3 seconds and memory footprint is about 215 MB.

Printing the problematic PDF with other applications like PDFPrint[2],
there's no problem at all, even if that app is configured to render an
image to print as well. Processing time is around 2 seconds, memory
footprint is maybe 60 MB. So in the end, I simply find the numbers for
PDFBox and that special PDF unexpected high.

The PDF is created automatically from some RTF template in a process
in which some app adds pieces of information to the RTF template file
and converts that to PDF using some arbitrary PDF printer in Windows.
The printing application is MS Word 2010 or such, shouldn't care much.
The PDF looks and opens OK in Adobe Reader, SumatraPDF and whatever
and can be printed from there manually without the high numbers PDFBox
is giving as well.

The command line used to print is the following:

> java -jar "C:\Users\[...]\pdfprint.jar" PrintPDF -silentPrint "C:\Users\[...]\0001-print5B7A1242.pdf"

I don't think that the problem is related to the version of Java used,
because I recognized that behaviour almost a year ago with different
java as well already:

> C:\Users\[...]>java -version
> java version "1.8.0_152"
> Java(TM) SE Runtime Environment (build 1.8.0_152-b16)
> Java HotSpot(TM) 64-Bit Server VM (build 25.152-b16, mixed mode)

So, is there any interest to have a more detailed look at the PDF?
Should I file a bug instead?


[1]: https://issues.apache.org/jira/browse/PDFBOX-3729?focusedCommentId=15945755&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15945755
[2]: http://www.verypdf.com/app/pdf-print-cmd/

Mit freundlichen Grüßen,

Thorsten Schöning

Thorsten Schöning       E-Mail: Thorsten.Schoening@AM-SoFT.de
AM-SoFT IT-Systeme      http://www.AM-SoFT.de/

Telefon...........05151-  9468- 55
Fax...............05151-  9468- 88
Mobil..............0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow

To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

View raw message