pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Schultz <ch...@christopherschultz.net>
Subject Re: Limit PDF size
Date Wed, 02 Aug 2017 13:42:51 GMT
Hash: SHA256


On 8/2/17 2:28 AM, Tilman Hausherr wrote:
> Am 02.08.2017 um 01:17 schrieb Christopher Schultz: Tilman,
> On 8/1/17 4:42 PM, Tilman Hausherr wrote:
>>>> Am 01.08.2017 um 22:09 schrieb Christopher Schultz: Tilman,
>>>> On 8/1/17 3:22 PM, Tilman Hausherr wrote:
>>>>>>> The only thing that comes close to what you want is to
>>>>>>> create your PDDocument with
>>>>>>> MemoryUsageSetting.setupMixed(...) as parameter.
>>>> So that we can buffer to disk if the in-memory representation
>>>> gets too big? That sounds like a good approach, and probably
>>>> the most useful to m e.
>>>> It also appears that I can set a maximum in-memory limit
>>>> like this:
>>>> MemoryUsageSetting mus =
>>>> MemoryUsageSetting.setupMainMemoryOnly(1 * 1024 * 1024);
>>>> PDDocument doc = new PDDocument(mus);
>>>>> Yes. Although this would mean you'd get an exception if you
>>>>> use more. That's why I recommend the mixed one. You could
>>>>> use the memory limit for stress tests, i.e. create the
>>>>> "worst" possible file and see what you need.
> I think I'm okay with an exception in these cases. As I said, our
> PDFs only end up being a few kiB in size, so I've put a 1MiB cap on
> the memory-only memory usage strategy for the time being.
> I'm curious about what's being constrained, here... does PDFBox 
> estimate its current memory-usage of various PD* objects in memory
> and push to disk when that's exceeded, or does it just limit the
> amount of memory that gets used when serializing out to a stream.
>> There is no estimate... it writes in the dedicated space and if
>> it is full, it's either exception (if memory only) or writing to
>> disk cache.

I get that, but I want to understand exactly what things are "counted".

>> Yes... it's mostly images, fonts and page content streams.

So, if I write an image to the PDDocument, that "counts" towards the
memory/disk limits? What about plain text? Or the
PDPage/PDPageContentStreams? If I write 1000 pages of plain-text to
the PDDocument object, will that "fill up" the limited-memory I have
configured? Or does that memory limit only count when e.g. serializing
to a "compiled" PDF file (or whatever the right terminology is)?

>> [Using built-in fonts] is even better, because it doesn't use any
>>  additional space (and is faster too). Your application is a
>> very simple one :-)
Yes, we are just taking some raw information and exporting it as a
PDF. We wanted something simple AND we wanted to have file sizes as
small as possible.

I have a related question about fonts, but I'll ask that in a separate

>> You really should worry about other things... choose one or
>> many: climate change, russian hackers, terrorism, rising interest
>> rates, traffic jams, heavy rain flooding your basement, people
>> who don't wash their hands, whatever :-)

Who says I'm not a Russian hacker/taxi driver/hedge fund manager who
pours truckloads of water on people's houses without warning?

- -chris
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/


To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

View raw message