pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andreas Lehmkühler" <andr...@lehmi.de>
Subject Fwd: Re: Exception :org.apache.pdfbox.filter.FlateFilter - Stop reading corrupt stream
Date Thu, 15 Mar 2012 20:22:12 GMT
Forgot to cc users@


---------- Ursprüngliche Nachricht ----------
Von: "Andreas Lehmkühler" <andreas@lehmi.de>
An: Cool The Breezer <techcool.kumar@yahoo.com>
Datum: 15. März 2012 um 21:12
Betreff: Re: Exception :org.apache.pdfbox.filter.FlateFilter - Stop reading
corrupt stream

Hi,



Cool The Breezer <techcool.kumar@yahoo.com> hat am 15. März 2012 um 07:38
geschrieben:

> Hello Group,
>                         I recently downloaded PDFBox 1.6.0. I using to parse
> PDF files as URL in a multi-threaded environment, max 4 thread. It works fine
> for ~200 odd files and then displays following excpetion
> org.apache.pdfbox.filter.FlateFilter - Stop reading corrupt stream
> I am using pdfbox in Max OSX lion. I am using following code
>
> URL url = new URL( filePath );
> URLConnection urlConn = url.openConnection();
> InputStream inStream = urlConn.getInputStream();
> PDFParser pdfParser = new PDFParser(inStream);
> pdfParser.parse();
> document = new PDDocument(pdfParser.getDocument());
> PDFTextStripper stripper = new PDFTextStripper();
> String str = stripper.getText(document);
>
> inStream.close();
> output.close();
> document.close();


There may be a couple of different reasons for that. The version you are using
swallows the origin exception.

- one of your PDFs may be corrupt, try to find out if the exception occurs when
processing the very same document
- you ran into an issue which was resolved in the current trunk [1]
- OutOfMememory


>
> In addition to the above error, I am getting ERROR
> org.apache.pdfbox.pdmodel.font.PDCIDFont - Error: Could not parse predefined
> CMAP file for 'Adobe--UCS2' error but that does not stop the parser to extract
> text so I am ignoring this error. Please suggest me any work around.
>
> regards,
> RB

BR
Andreas Lehmkühler

[1] https://issues.apache.org/jira/browse/PDFBOX-1232
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message