pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amir H. Jadidinejad" <amir.jad...@yahoo.com.INVALID>
Subject How to manage semi-space characters in PDFTextStripper?
Date Tue, 05 Aug 2014 09:27:00 GMT
In some right-to-left languages, compound words are separated using "semi-space" (please take
a look at Unicode spaces). When the input document contains these words, PDFTextStripper neglects
semi-space character and concatenates words together. 

Would you please give me some hint to extend which function of PDFTextStripper to manage semi-space
characters?
Kind regards,
Amir
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message