pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Green <james.mk.gr...@gmail.com>
Subject Re: IOException should be something more specific?
Date Fri, 11 Jul 2014 08:08:28 GMT
This raises an interesting question, and one that applies to software in
general. I actually think PDFBox has it right - something more specific
might sound correct but to whom is it is useful? Exceptions in my
experience tend to bubble straight to the user (perhaps logged to file, and
an "oops" given to the user). The user in this case needs to be told
there's something wrong with the file, and the error itself says what.

Does PDFParseException give your software some new behaviour?



On 1 July 2014 19:20, Daniel Gibby <dgibby@edirectpublishing.com> wrote:

> Using Tika 1.5 (latest release which uses PDFBox) I'm seeing the following
> IOException parsing certain PDFs.
>
> java.io.IOException: Error: Header doesn't contain versioninfo
>    at org.apache.pdfbox.pdfparser.PDFParser.parseHeader(
> PDFParser.java:335)
>    at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:177)
>    at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1238)
>    at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1203)
>    at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:111)
> ...
>
> Should this be something more specific than just an IOException, so that
> Tika can know whether to just let it bubble up as an IOException, or
> encapsulate it into a TikaException?
>
> I don't know enough about the PDFBox project to know if there are ever any
> exceptions besides IOExceptions thrown. Perhaps there could be a
> PDFParseException or something like that when you run into known
> situations. But if IOExceptions only ever happen when you run into known
> situations, then Tika could just know that is the case and wrap any
> IOException from PDFBox into a TikaException.
>
> What do you think?
>
> Thanks,
> Daniel Gibby
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message