pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cool The Breezer <techcool.ku...@yahoo.com>
Subject Exception :org.apache.pdfbox.filter.FlateFilter - Stop reading corrupt stream
Date Thu, 15 Mar 2012 06:38:52 GMT
Hello Group,
                        I recently downloaded PDFBox 1.6.0. I using to parse PDF
files as URL in a multi-threaded environment, max 4 thread. It works fine for ~200 odd files
and then displays following excpetion
org.apache.pdfbox.filter.FlateFilter - Stop reading corrupt stream
I am using pdfbox in Max OSX lion. I am using following code

URL url = new URL( filePath );
URLConnection urlConn = url.openConnection();
InputStream inStream = urlConn.getInputStream();
PDFParser pdfParser = new PDFParser(inStream);
pdfParser.parse();
document = new PDDocument(pdfParser.getDocument());
PDFTextStripper stripper = new PDFTextStripper();
String str = stripper.getText(document);

inStream.close(); 
output.close();
document.close();

In addition to the above error, I am getting ERROR org.apache.pdfbox.pdmodel.font.PDCIDFont
- Error: Could not parse predefined CMAP file for 'Adobe--UCS2' error but that does not stop
the parser to extract text so I am ignoring this error. Please suggest me any work around.

regards,
RB
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message