pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Gibby <dgi...@edirectpublishing.com>
Subject IOException should be something more specific?
Date Tue, 01 Jul 2014 18:20:25 GMT
Using Tika 1.5 (latest release which uses PDFBox) I'm seeing the 
following IOException parsing certain PDFs.

java.io.IOException: Error: Header doesn't contain versioninfo
    at 
org.apache.pdfbox.pdfparser.PDFParser.parseHeader(PDFParser.java:335)
    at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:177)
    at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1238)
    at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1203)
    at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:111)
...

Should this be something more specific than just an IOException, so that 
Tika can know whether to just let it bubble up as an IOException, or 
encapsulate it into a TikaException?

I don't know enough about the PDFBox project to know if there are ever 
any exceptions besides IOExceptions thrown. Perhaps there could be a 
PDFParseException or something like that when you run into known 
situations. But if IOExceptions only ever happen when you run into known 
situations, then Tika could just know that is the case and wrap any 
IOException from PDFBox into a TikaException.

What do you think?

Thanks,
Daniel Gibby

Mime
View raw message