pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: Make PDFBox fail on bad pdf
Date Thu, 30 Mar 2017 16:51:28 GMT
The problem is that some files do this as an obfuscation technique.

What might be detected is fonts that don't have unicode extraction. See 
in LegacyPDFStreamEngine "if (unicode == null)". Make your own or extend 
it and check for TextPosition objects with unicode null. (See 
PrintTextLocations example from the source code download on how to get 
TextPosition objects).


To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

View raw message