pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stahle, Patrick" <patrick.sta...@te.com>
Subject RE: Strange performance problem with certain PDF files
Date Mon, 21 Mar 2016 19:58:23 GMT
Hi John / Tillman,

I have reduced it down to be a difference between doing a PDDocument.save() using FileOutputStream.
If I pass in Java File instead, the problem does not occur. Also we have only been able to
reproduce it on some larger pdf files. It also seems to only happen in certain environments.
On my linux virtual machine I have not been able to reproduce it at all. Windows and Solaris
Server (3par drive cluster). I have some simple sample code that reproduces the problem but
the 2 pdf files I have at hand I don't think I can send you. The one is a 3D PDF of ours (TE
Classified) and the other ironically is IText v1 manual in pdf form. The times are pretty
drastic, on Windows the 3D PDF with using Java File class is about 3 seconds vs.  29 seconds
for the FileOutputStream. IText manual is not as bad at 2 vs. 20. 

Anyways, we have a workaround. We just converted our code to pass Java File class for use
by PDFBox. If I can find a suitable PDF that reproduces the problem I will send it your way.


-----Original Message-----
From: John Hewson [mailto:john@jahewson.com] 
Sent: Friday, March 18, 2016 4:45 PM
To: users@pdfbox.apache.org
Subject: Re: Strange performance problem with certain PDF files

> On 18 Mar 2016, at 12:01, Stahle, Patrick <patrick.stahle@te.com> wrote:
> Hi all,
> I am running into a lot of strange performance issues with certain PDF files.
> Background info:
> The strange thing I can't reproduce this consistently. When I get a pdf being generated
on a particular environment it seems consistent. I do most of my development inside VirtualBox
virtual machine running fedora. These pdf files I am having problems with never have performance
issues when run on my virtual machine local drive, but if I use a Virtual Box Shared drive
as the source / destination for the PDF, I see the problem. Another co-worker working from
pure windows environment experience the performance problem. We are also seeing the same issue
on our dev solaris servers. The performance range can be quite drastic on one of our 3DPDF's
(12meg) running on my local environment it can be opened, stamped with some text, encrypted,
and saved in around 8 sec. Doing the same job pointing to a virtual box share drive or on
our solaris server that same work will take minutes. On my coworkers windows environment it
takes around 30 seconds. We really only reproduced this consistently on the 12m 3D  PDF. I
have a much smaller pdf (non 3d / convert from msoffice) that does show similar performance
issue but the times range from 200ms local to 8 sec.

You need to isolate the problem, you’ve got too many variables to make any sense of it all.
Get a reproducible problem on one, non-virtualised JVM first.

— John

> The one thing I see in common between the 2 files is I see a lot of the following messages
to the console:
> Using output from the 12m 3DPDF file:
> :
> :
> 1787 [main] DEBUG org.apache.pdfbox.pdfparser.PDFObjectStreamParser  - parsed=COSObject{13166,
> These messages seem to happen on the PDDocument.open and from what I can tell, I get
13,166 of these messages in this example PDF.
> The slowness does not happen until the following line:
> document.save(outputPDFStream);
> Other PDF's including some quite large I do not see this performance issue nor those
log messages.
> I know this is not much to go on, I am working on seeing if I can isolate this down to
something more concrete / reproducible point. But I thought I would send this out to see if
anyone has any ideas or have seen issues similar to this? Suggestions?
> Thanks,
> Patrick

To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

View raw message