pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkuehler <andr...@lehmi.de>
Subject Re: PDFTextStripper strips copyright sign
Date Tue, 09 Apr 2019 15:34:39 GMT
Hi,

Am 09.04.19 um 14:55 schrieb Ronald Bergmann | DTAD AG:
> Hello,
> 
> this is my first time ever to email to a mailing list so please excuse me if my 
> contribution does not match any standards.
> 
> Apache PDFBox seems to strip copyright signs when parsing PDFs to text and I 
> wonder why. When I open the PDF with any reader and copy the text I receive the 
> copyright sign. With PDFBox I get a white space character.
> 
> PDFTextStripper stripper =new PDFTextStripper(); String contents = stripper.getText(doc);
> 
> I use PDFBox 2.0.14 on jdk 1.8.
> 
> Is there any trick to get the copyright sign, is it a bug or is it not possible 
> to retrieve it for it's some magically drawn glyph?
Please upload the PDF in question to a sharehoster or something similar. 
Attachments are not allowed. Without the document it'll be hart to guess wants 
wrong.

> Thanks in advance!
> 
> -- 

Andreas

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message