pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hesham G." <heshamgne...@gmail.com>
Subject Problem extracting text in Enter chars
Date Thu, 31 Dec 2009 10:41:51 GMT
Hello ,

I have a PDF file with 1 page only, when I try to extract its text using :
String pageData = stripper.getText( pdfFile );

It ignores some Enter characters between lines, so the last word in the line and the first
word in the next line appear as 1 word without spaces between them !!

While if I copy the PDF text manually from the PDF and paste it in a text editor, Enter characters
appear after the same lines that caused the problem in PDFBox.
You can download the PDF file from here to try it :
http://www.4shared.com/file/185259485/5d937eb/Enters-sample.html

Is there a way to fix this ?

Best regards ,
Hesham


 
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message