pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: Question on text extraction
Date Mon, 06 Nov 2017 07:09:36 GMT
Am 06.11.2017 um 07:04 schrieb Jesse James Joson:
> Hi,
> I encounter some issue regrding on the extraction of text using PDF box
> 2.0.7. When I open the pdf file using Acrobat I see the content, it can be
> select and search. The specific character "-" cannot be read correctly,
> when the file undergo PDFbox it retrieves "?" in replacement for the hyphen.
> Thank you

Somewhat answered here:


Another useful read to see how tricky this is:


For a specific answer, please link to the PDF. But if Adobe can't 
extract it, then it's unlikely PDFBox can.


To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

View raw message