pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkühler <andr...@lehmi.de>
Subject Re: White Spaces
Date Thu, 05 Sep 2013 09:59:43 GMT
Hi,

> Omid Rashidi <sh.omidrashidi@gmail.com> hat am 5. September 2013 um 08:15
> geschrieben:
>
>
> Hi
>
> I want to extract table data of PDF file with PDFBOX in JAVA .
> but removed white spaces more one space when I extract text of PDF .
>
>
>             PDDocument _pd=PDDocument.load("3.pdf");
>             PDFTextStripper _txt=new PDFTextStripper();
>             System.out.println(_txt.getText(_pd));
>
> how to bridle to removing white spaces?
This is just a guess as you didn't provide a sample pdf.

Many pdfs don't contain any white spaces. Most likely all characters are placed
directly
using specific coordinates. To insert some space the pdf an additional offset is
added to
the coordinates.
To sum it up, I'm pretty sure that everything works as expected.

> thanks,
> rashidi

BR
Andreas Lehmkühler

Mime
View raw message