pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: PDFParser Error Caused by: org.apache.pdfbox.exceptions.WrappedIOException
Date Sat, 07 Mar 2015 13:21:31 GMT
The best would be to test whether that file can be handled by newer 
versions of PDFBox (1.8.9 and 2.0)

https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox/1.8.9-SNAPSHOT/
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox/2.0.0-SNAPSHOT/

download the jar files, for each one try

     - run java -jar <jarfile> ExtractText <yourfile>
     - see what happens
     - tell it

Your paste indicates a problem in RandomAccessBuffer.java.

Tilman

Am 06.03.2015 um 21:05 schrieb Ganesh.Yadav@sungard.com:
> Hello,
> I am getting PDFParser Error Caused by: org.apache.pdfbox.exceptions.WrappedIOException
> Complete stack trace is on the following link.
> ( http://apaste.info/DRD )
>
> I am trying to import 4GB Long PDF using Tika into Solr. I was able to import up to 500MB.
>
>
> Please suggest if there is any workaround.
>
> Thanks
> G
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message