pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maruan Sahyoun <sahy...@fileaffairs.de>
Subject Re: Java heap space issue while reading larger size pdf document.
Date Fri, 25 Jan 2013 07:59:00 GMT
Hi Manoj,

the size alone is not the cause of the issue. In a recent project we were handling PDF's larger
than the one you are talking about.

1. Can you test with the Non Sequential Parser i.e. PDDocument.loadNonSeq(…) and confirm
that this is causing the same issue.
2. Can you upload a sample PDF which enables us to reproduce the issue? Without that it will
be very difficult to say why this is happening. You can attach it to the issue you created
3. Of course you can try with larger heap settings until it works but I don't think this is
a good approach.

In addition to that it would be good if you could describe what you want to achieve with the
PDF. Maybe there are ways doing so without parsing the complete file.

With kind regards

Maruan Sahyoun

Am 25.01.2013 um 07:56 schrieb Manoj Patel <patelmanojb@hotmail.com>:

> Hi,
> I am facing issue while reading larger size pdf document which is around 700 mb. I am
using latest build and its giving
> below mentioned error
> Exception in thread "main" org.apache.pdfbox.exceptions.WrappedIOException
>    at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:243)
>    at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071)
>    at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)
>    at imageData.ReadLargeFile.main(ReadLargeFile.java:13)
> Caused by: java.lang.OutOfMemoryError: Java heap space
>    at java.io.BufferedOutputStream.<init>(BufferedOutputStream.java:59)
>    at org.apache.pdfbox.cos.COSStream.createFilteredStream(COSStream.java:415)
>    at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:452)
>    at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566)
>    at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
>    ... 3 more
> If i use loadNonSeq to load document its gives error 
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>    at java.lang.String.substring(String.java:1940)
>    at java.lang.String.subSequence(String.java:1973)
>    at java.util.regex.Pattern.split(Pattern.java:1002)
>    at java.lang.String.split(String.java:2293)
>    at java.lang.String.split(String.java:2335)
>    at org.apache.pdfbox.pdfparser.PDFParser.parseXrefTable(PDFParser.java:725)
>    at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.initialParse(NonSequentialPDFParser.java:296)
>    at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parse(NonSequentialPDFParser.java:617)
>    at org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1124)
>    at org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1107)
>    at imageData.ReadLargeFile.main(ReadLargeFile.java:13)
> Thanks

View raw message