pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hesham Gneady" <heshamgne...@gmail.com>
Subject Wrong space parsed pdf
Date Thu, 25 Jan 2018 14:20:06 GMT
Hello ,

 

While reading a pdf using PDFBox v2.0.8 many spaces are being ignored, so
words are merged together while reading the pdf. You can test a 1-page
sample PDF from here:

https://www.dropbox.com/s/9i9ofl3tje4iy3k/wrong_space_parsed_sample.pdf?dl=1

 

You can see wrong read words like :

aboutmidnight, andbefore, CountyDonegal, ...

 

I have tried to use PDFTextStripper.setAverageCharTolerance(...) to control
space sensitivity but it didn't make any change.

 

Any idea why this happens and how to fix it ?

 

Best regards ,

Hesham

 

 



---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message