pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ronald Bergmann | DTAD AG <rbergm...@dtad.de>
Subject PDFTextStripper strips copyright sign
Date Tue, 09 Apr 2019 12:55:04 GMT

this is my first time ever to email to a mailing list so please excuse 
me if my contribution does not match any standards.

Apache PDFBox seems to strip copyright signs when parsing PDFs to text 
and I wonder why. When I open the PDF with any reader and copy the text 
I receive the copyright sign. With PDFBox I get a white space character.

PDFTextStripper stripper =new PDFTextStripper(); String contents = stripper.getText(doc);

I use PDFBox 2.0.14 on jdk 1.8.

Is there any trick to get the copyright sign, is it a bug or is it not 
possible to retrieve it for it's some magically drawn glyph?

Thanks in advance!


View raw message