pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brent Pathakis <bpatha...@utah.gov>
Subject Problem loading large pdf files
Date Wed, 30 Oct 2013 14:48:45 GMT
Hi,

  I'm trying to use PDFbox to load a large pdf document (>1gb):
[
                      File inputPdf = new File("c:\\some.pdf");
   PDFTextStripper stop = new PDFTextStripper ();

FileInputStream fis=null;
 fis=new FileInputStream(inputPdf);
pd = PDDocument.load(fis,true);[/CODE]

  This code works fine for smaller pdfs, but only larger ones I'm getting:

  org.apache.pdfbox.exceptions.WrappedIOException
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:245)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1192)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1159)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1130)
at PDFRedact.main(PDFRedact.java:19)
Caused by: java.lang.IndexOutOfBoundsException: Index: 15625, Size: 15625
at java.util.ArrayList.RangeCheck(Unknown Source)
at java.util.ArrayList.get(Unknown Source)
at org.apache.pdfbox.io.RandomAccessBuffer.seek(RandomAccessBuffer.java:84)
at org.apache.pdfbox.io.RandomAccessFileOutputStream.write(
RandomAccessFileOutputStream.java:106)
at java.io.BufferedOutputStream.flushBuffer(Unknown Source)
at java.io.BufferedOutputStream.flush(Unknown Source)
at java.io.FilterOutputStream.close(Unknown Source)
at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:
610)
at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:568)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:188)
... 4 more


   Any ideas or help would be appreciated.

*Brent Pathakis*
801 536 0041

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message