pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dan Liu" <139250...@qq.com>
Subject Re: all spaces between english words is lost after extraction
Date Thu, 21 Dec 2017 10:16:30 GMT
setSpacingTolerance is useless even I set it to 0.001

------------------
  此致
祝好


刘丹
邮件:139250065@qq.com
手机:13925006500



 




------------------ Original ------------------
From:  "Gilad Denneboom";<gilad.denneboom@gmail.com>;
Date:  Thu, Dec 21, 2017 06:14 PM
To:  "users"<users@pdfbox.apache.org>;

Subject:  Re: all spaces between english words is lost after extraction



Try playing around with different values for the Spacing Tolerance, by
using the setSpacingTolerance method of PDFTextStripper.

On Wed, Dec 20, 2017 at 3:46 AM, Dan Liu <139250065@qq.com> wrote:

> Hello all:
>     I extract the text according to the codes of
> https://www.tutorialkart.com/pdfbox/how-to-extract-
> coordinates-or-position-of-characters-in-pdf/ , but all spaces between
> english words are lost.
>
> Such as:
> "severe acute respiratory syndrome"
>
> becomes:
> severeacuterespiratorysyndrome
>
> The attachment is origianl text.
>
>
> ------------------
>
> With best regards
> Daniel
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message