pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brent Pathakis <bpatha...@utah.gov>
Subject Re: Problem loading large pdf files
Date Wed, 30 Oct 2013 18:30:46 GMT
Disregard the last message.

I was using RandomAccess and RandomAccesFile from java.io*

*Brent Pathakis*
801 536 0041


On Wed, Oct 30, 2013 at 12:02 PM, Brent Pathakis <bpathakis@utah.gov> wrote:

> I tried this:
>
> RandomAccess scratchFile=null;
>
>
> if (tmpFile.exists()){
>  tmpFile.delete();
>   scratchFile = new RandomAccessFile(tmpFile, "rw");
>  }else
> {
>  scratchFile =  new RandomAccessFile(tmpFile, "rw");
>  }
>
> But eclipse tells me:
>
> Type mismatch: cannot convert from RandomAccessFile to RandomAccess
>
> If I try this:
>
> if (tmpFile.exists()){ tmpFile.delete(); RandomAccess scratchFile = (
> RandomAccess) new RandomAccessFile(tmpFile, "rw"); }else { RandomAccess
> scratchFile = (RandomAccess) new RandomAccessFile(tmpFile, "rw"); }
>
>
> Then I this error at run time:
>
> java.lang.ClassCastException: java.io.RandomAccessFile cannot be cast to
> org.apache.pdfbox.io.RandomAccess at PDFRedact.main(PDFRedact.java:34)
>
>
> *Brent Pathakis*
> 801 536 0041
>
>
> On Wed, Oct 30, 2013 at 10:55 AM, Gilad Denneboom <
> gilad.denneboom@gmail.com> wrote:
>
>> I used this code in one occasion:
>>
>>         String tmpFilePath =
>> System.getProperty("java.io.tmpdir")+File.separator+"scratch.tmp";
>>         File tmpFile = new File(tmpFilePath);
>>         if (tmpFile.exists())
>>             tmpFile.delete();
>>         RandomAccess scratchFile = new RandomAccessFile(tmpFile, "rw");
>>
>>         PDDocument doc = PDDocument.load( filePath, scratchFile );
>>
>>
>>
>> On Wed, Oct 30, 2013 at 5:31 PM, Brent Pathakis <bpathakis@utah.gov>
>> wrote:
>>
>> > Thanks. Do you have an example of code using the scratch file?
>> > On Oct 30, 2013 9:30 AM, "Gilad Denneboom" <gilad.denneboom@gmail.com>
>> > wrote:
>> >
>> > > Try using a scratch file in the load method of PDDocument.
>> > >
>> > >
>> > > On Wed, Oct 30, 2013 at 3:48 PM, Brent Pathakis <bpathakis@utah.gov>
>> > > wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > >   I'm trying to use PDFbox to load a large pdf document (>1gb):
>> > > > [
>> > > >                       File inputPdf = new File("c:\\some.pdf");
>> > > >    PDFTextStripper stop = new PDFTextStripper ();
>> > > >
>> > > > FileInputStream fis=null;
>> > > >  fis=new FileInputStream(inputPdf);
>> > > > pd = PDDocument.load(fis,true);[/CODE]
>> > > >
>> > > >   This code works fine for smaller pdfs, but only larger ones I'm
>> > > getting:
>> > > >
>> > > >   org.apache.pdfbox.exceptions.WrappedIOException
>> > > > at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:245)
>> > > > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1192)
>> > > > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1159)
>> > > > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1130)
>> > > > at PDFRedact.main(PDFRedact.java:19)
>> > > > Caused by: java.lang.IndexOutOfBoundsException: Index: 15625, Size:
>> > 15625
>> > > > at java.util.ArrayList.RangeCheck(Unknown Source)
>> > > > at java.util.ArrayList.get(Unknown Source)
>> > > > at
>> > >
>> org.apache.pdfbox.io.RandomAccessBuffer.seek(RandomAccessBuffer.java:84)
>> > > > at org.apache.pdfbox.io.RandomAccessFileOutputStream.write(
>> > > > RandomAccessFileOutputStream.java:106)
>> > > > at java.io.BufferedOutputStream.flushBuffer(Unknown Source)
>> > > > at java.io.BufferedOutputStream.flush(Unknown Source)
>> > > > at java.io.FilterOutputStream.close(Unknown Source)
>> > > > at
>> > org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:
>> > > > 610)
>> > > > at
>> > org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:568)
>> > > > at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:188)
>> > > > ... 4 more
>> > > >
>> > > >
>> > > >    Any ideas or help would be appreciated.
>> > > >
>> > > > *Brent Pathakis*
>> > > > 801 536 0041
>> > > >
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message