pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wouter De Borger <wouter.debor...@inmanta.com>
Subject Re: Make PDFBox fail on bad pdf
Date Fri, 31 Mar 2017 11:16:47 GMT
thanks a lot, that looks like the clean solution!

For type0 fonts, no textposition is created, but I can live with that.

Thanks,
Wouter

On Thu, Mar 30, 2017 at 6:51 PM, Tilman Hausherr <THausherr@t-online.de>
wrote:

> The problem is that some files do this as an obfuscation technique.
>
> What might be detected is fonts that don't have unicode extraction. See in
> LegacyPDFStreamEngine "if (unicode == null)". Make your own or extend it
> and check for TextPosition objects with unicode null. (See
> PrintTextLocations example from the source code download on how to get
> TextPosition objects).
>
> Tilman
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>


-- 
Wouter De Borger, PhD
Co-founder Inmanta
www.inmanta.com
Email: wouter.deborger@inmanta.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message