pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hesham G." <heshamgne...@gmail.com>
Subject Re: Wrong space parsed pdf
Date Tue, 25 Mar 2014 19:41:25 GMT
Tilman ,

I didn't actually test it, but I might try that version.


Best regards ,
Hesham


------------------------------------------------------------------------
Included message :

Hi,

Does this also happen with the current version? (1.8.4)

Tilman

Am 25.03.2014 13:53, schrieb Hesham G.:
> Hello ,
>
> While reading a pdf using PDFBox 1.7.1 many spaces are being ignored, so 
> words are merged together while reading the pdf. You can test a 1-page 
> sample PDF from here :
> http://www.4shared.com/office/yqJGUZn2ce/wrong_space_parsed_sample.html
>
> You can see wrong read words like :
> aboutmidnight, andbefore, CountyDonegal, ...
>
> I have tried to use PDFTextStripper.setAverageCharTolerance(...) to 
> control space sensitivity but it didn’t make any change.
>
> Any idea why this happens and how to fix it ?
>
>
> Best regards ,
> Hesham


Mime
View raw message