pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maruan Sahyoun <sahy...@fileaffairs.de>
Subject Re: Handling PDFs with missing version in header
Date Thu, 10 Oct 2013 10:23:53 GMT
Hi Chris,

put in 1.4 an you are fine. In fact PDFBox doesn't take the version number into account.

BR

Maruan Sahyoun

Am 10.10.2013 um 10:55 schrieb Chris Bamford <cbamford@mimecast.com>:

> Hi there,
> 
> I am attempting text extraction with PDFBox 1.8.2.
> 
> For reasons I cannot explain, I am sometimes sent PDFs with no version number in the
header, e.g.
> 
> %PDF-\r\n
> 
> instead of, say
> 
> %PDF-1.7\r\n
> 
> (I have checked, the version number does not appear in the next couple of lines, either.)
> 
> This causes PDFParser.parseHeader() to die as it attempts to perform a negative substring
offset calculation.  My question is:
> if I could detect this situation and default it to a really low version (%PDF-1.0 ?),
would it be safe - or would other things break later on?
> 
> Thanks for any help.
> 
> - Chris
> 
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message