pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Schultz <ch...@christopherschultz.net>
Subject Re: Limit PDF size
Date Tue, 01 Aug 2017 20:09:26 GMT
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Tilman,

On 8/1/17 3:22 PM, Tilman Hausherr wrote:
> The only thing that comes close to what you want is to create your 
> PDDocument with MemoryUsageSetting.setupMixed(...) as parameter.

So that we can buffer to disk if the in-memory representation gets too
big? That sounds like a good approach, and probably the most useful to m
e.

It also appears that I can set a maximum in-memory limit like this:

MemoryUsageSetting mus = MemoryUsageSetting.setupMainMemoryOnly(1 *
1024 * 1024);
PDDocument doc = new PDDocument(mus);

... and then this should enforce a 1MiB size limit, no? I think that's
all I want... there shouldn't be any reason for me to have to touch
the disk: my files are really quite small. I just don't want something
to go wrong with my client code and inadvertently go into an infinite
loop adding "Hello World" to the document over and over until I have
50k pages in the PDF and an OOME on my hands.

> What you should do is to care to not have anything duplicate. So if
> you have a company logo on every page, create your object object
> only once. Same for fonts.

We have something like:

private Font _theFont;

...
contentStream.setFont(_theFont);
contentStream.newLineAtOffset(x,y);
contentStream.showText("Hello, world");
...


Many many times. The Font object reference stays the same, so I'm
guessing that's okay and the font is used once and referenced many
times, right?


> And try to have only one content stream per page. (We recently had
> a guy who had a huge number of content streams and wondered why his
> PDF was so big).
Check: we have only one PDPageContentStream per page.

We have a single logo on the first page and nothing repeated.

Our PDFs are almost 100% plain-text with lots of whitespace (which
doesn't count, I know). When base64 encoded, they are typically only a
few kb in size.

I'm mostly operating from a position of borderline unhealthy paranoia,
but I'd rather have a bit of code added to ensure that I don't have to
get paged in the middle of the night to restart a service that has
suffered an OOME.

Thanks for the pointers.

- -chris

> Am 01.08.2017 um 20:04 schrieb Christopher Schultz: All,
> 
> We use PDFBox on a server that must handle many transactions with 
> (somewhat) limited memory. I'd like to limit the amount of memory
> used to generate our PDFs, which we then serialize to byte-array, 
> base64-encode, etc. for ultimate delivery to some endpoint.
> 
> I can obviously limit the number of bytes produced by using a 
> size-limited OutputStream passed-into
> PDDocument.save(OutputStream), but I'm wondering if PDFBox has any
> facilities within it to limit the size of the object-tree in memory
> (or estimate its size, and we can stop operations when it reaches a
> certain size) so that we don't end up with a multi-GB object-tree
> that then fails to serialize to byte[] because it is too big.
> 
> We are building our PDF documents from scratch, starting with the
> page definitions, fonts, etc. then adding titles, paragraphs of
> text, etc. It's all fairly straightforward, and we have full
> control over the whole process up to and including the call to 
> PDDocument.save(OutputStream).
> 
> We are manually constructing our pages as well, so I suppose we
> could simply limit the number of pages, but I'm more concerned
> about the size of the memory used and not the number of pages.
> 
> Is there anything in PDFBox that can help us with this? We can
> always count e.g. the number of bytes/characters we have written to
> the PDF, but that seems less important than what is going on inside
> of the PDF structure itself.
> 
> -chris
>> 
>> ---------------------------------------------------------------------
>>
>> 
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>> 
> 
> 
> ---------------------------------------------------------------------
>
> 
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
> 
-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCAAGBQJZgN/2AAoJEBzwKT+lPKRYlLUQAK/eAna/kwigraXZ/ghwfB+U
qe36r5yqUc9TMmCa7cunJuLJxMAnH6UnbNzNJm4IChMXmtLk++uF9YMKpPN0irQr
RxAaNlUbNpnyJqXR/W/7ZTVo4gP2l7JYQqARcSLjxuROLqALF1jp8BoXMw0Zz8L4
rfEub/dVk3EIBvg+ithGeqzzb67yoPEbCP9LVsXoxyvrTER1mB28BmmSZsw2hVD5
HLKzmu3e4XLXdi+MKBfJfF0Y+S4/7/yq+4f0KBq/AD7VlNeUwOv6j0kiVkT5Tdv/
tJGtheC1M6dXVLqQD7/G/q37/kdgCeG12yTbpw8FUMbfn4yHrtd8Fqmxz6au8qpm
Fu0xhGy1SobxiGXgpFCNED0fdGz0f56TYFPb8KgtAveHZuoPlDcyq9WdDThRl/zn
Oxs1ytkFf4W0RbdNcR/wtQLxVUVbPUuNE5gFKqNf282H7fj5q/I3cyCmafUnecz0
bjcHfCS4EpciYnfJT1OihRGDGBXSHZfwXEqFva8hyQ5cRLWuyqsz8Ii2DaiLoe4g
Y8pP3/dWNV5SgtQxrgVAScry10G06ybIoYj9rXz/QW6a30Hj4Dt2bFrr/n/FS1L9
G3qtsg41hXRMXT5Oly0WzgYv+fwfNCO3pJ4MB7dpuNHcTsi1Jp/capK7oA5aKqEn
bo9GBaEOciUoVYbP1vb+
=F6Jq
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message