pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Augusto Ribeiro Silva <...@unsilo.com>
Subject Weird spacing in words
Date Tue, 31 May 2016 13:22:17 GMT
Hi all,

I am using PDFBox java library to read the content of some PDFs and it seems like it inserts
some weird (hyphen-like) spacing. I get the same result using the PDFBox-App command line

The es tab lish ment of an in te grated Part ner Re la tion ship Man age ment (PRM) sys tem
can po ten tially ad dress sev eral as pets

I tried to extract text from the same PDF using the pdftotext command line utility it extracts
the text correctly:
The establishment of an integrated Partner Relationship Management (PRM) system can potentially
address several aspects 

Does somebody have any idea why PDFBox behaves in this way and any tips to fixing it? I am
using TIKA but as I understood TIKA uses PDFBox for PDF processing underneath.

Best regards, 
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

View raw message