pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Schultz <ch...@christopherschultz.net>
Subject Re: Limit PDF size
Date Tue, 01 Aug 2017 20:09:26 GMT
Hash: SHA256


On 8/1/17 3:22 PM, Tilman Hausherr wrote:
> The only thing that comes close to what you want is to create your 
> PDDocument with MemoryUsageSetting.setupMixed(...) as parameter.

So that we can buffer to disk if the in-memory representation gets too
big? That sounds like a good approach, and probably the most useful to m

It also appears that I can set a maximum in-memory limit like this:

MemoryUsageSetting mus = MemoryUsageSetting.setupMainMemoryOnly(1 *
1024 * 1024);
PDDocument doc = new PDDocument(mus);

... and then this should enforce a 1MiB size limit, no? I think that's
all I want... there shouldn't be any reason for me to have to touch
the disk: my files are really quite small. I just don't want something
to go wrong with my client code and inadvertently go into an infinite
loop adding "Hello World" to the document over and over until I have
50k pages in the PDF and an OOME on my hands.

> What you should do is to care to not have anything duplicate. So if
> you have a company logo on every page, create your object object
> only once. Same for fonts.

We have something like:

private Font _theFont;

contentStream.showText("Hello, world");

Many many times. The Font object reference stays the same, so I'm
guessing that's okay and the font is used once and referenced many
times, right?

> And try to have only one content stream per page. (We recently had
> a guy who had a huge number of content streams and wondered why his
> PDF was so big).
Check: we have only one PDPageContentStream per page.

We have a single logo on the first page and nothing repeated.

Our PDFs are almost 100% plain-text with lots of whitespace (which
doesn't count, I know). When base64 encoded, they are typically only a
few kb in size.

I'm mostly operating from a position of borderline unhealthy paranoia,
but I'd rather have a bit of code added to ensure that I don't have to
get paged in the middle of the night to restart a service that has
suffered an OOME.

Thanks for the pointers.

- -chris

> Am 01.08.2017 um 20:04 schrieb Christopher Schultz: All,
> We use PDFBox on a server that must handle many transactions with 
> (somewhat) limited memory. I'd like to limit the amount of memory
> used to generate our PDFs, which we then serialize to byte-array, 
> base64-encode, etc. for ultimate delivery to some endpoint.
> I can obviously limit the number of bytes produced by using a 
> size-limited OutputStream passed-into
> PDDocument.save(OutputStream), but I'm wondering if PDFBox has any
> facilities within it to limit the size of the object-tree in memory
> (or estimate its size, and we can stop operations when it reaches a
> certain size) so that we don't end up with a multi-GB object-tree
> that then fails to serialize to byte[] because it is too big.
> We are building our PDF documents from scratch, starting with the
> page definitions, fonts, etc. then adding titles, paragraphs of
> text, etc. It's all fairly straightforward, and we have full
> control over the whole process up to and including the call to 
> PDDocument.save(OutputStream).
> We are manually constructing our pages as well, so I suppose we
> could simply limit the number of pages, but I'm more concerned
> about the size of the memory used and not the number of pages.
> Is there anything in PDFBox that can help us with this? We can
> always count e.g. the number of bytes/characters we have written to
> the PDF, but that seems less important than what is going on inside
> of the PDF structure itself.
> -chris
>> ---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
> ---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/


To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

View raw message