pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gilad Denneboom <gilad.denneb...@gmail.com>
Subject Re: all spaces between english words is lost after extraction
Date Thu, 21 Dec 2017 10:14:33 GMT
Try playing around with different values for the Spacing Tolerance, by
using the setSpacingTolerance method of PDFTextStripper.

On Wed, Dec 20, 2017 at 3:46 AM, Dan Liu <139250065@qq.com> wrote:

> Hello all:
>     I extract the text according to the codes of
> https://www.tutorialkart.com/pdfbox/how-to-extract-
> coordinates-or-position-of-characters-in-pdf/ , but all spaces between
> english words are lost.
>
> Such as:
> "severe acute respiratory syndrome"
>
> becomes:
> severeacuterespiratorysyndrome
>
> The attachment is origianl text.
>
>
> ------------------
>
> With best regards
> Daniel
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message