pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maruan Sahyoun <sahy...@fileaffairs.de>
Subject Re: Open in ReadOnly very large file.
Date Thu, 21 Mar 2013 08:49:35 GMT
Hi Pierre,

If you load from an input stream a temporary file will be created. Try loading from java.io.File
or pass the filename. In addition you do not have to provide a scratch file. In that case
your memory consumption will be much higher. 

In addition the NonSequentialParser supports a system property org.apache.pdfbox.pdfparser.nonSequentialPDFParser.parseMinimal.
Setting that to 'true' object references in catalog are not followed. That might help (I have
never used that though, looked it up in the sources). Depends on your use case.

What are you trying to do with the file? Which information are you looking for?

Maruan Sahyoun

Am 21.03.2013 um 09:41 schrieb Pierre Huttin <pierre@huttin.com>:

> Hello,
> I'm trying to work on very large PDF file (21GB), and I want to extract some pages, the
problem is when I load the file in a PDDocument it create a scratchfile around the same size
than the file, and yesterday evening it took 3H30 just to load the file.
> PDDocument.loadNonSeq (method)
> Is it possible to open the file in "Read Only" and "Read All from disk" ? because I don't
really understand why I need to load the complete file in scratchfile just for reading ?
> thanks for yours answers/comments/ideas how to solve this.
> Pierre Huttin

View raw message