pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: all spaces between english words is lost after extraction
Date Wed, 20 Dec 2017 08:43:19 GMT
Hi,

Please upload your file to a sharehoster. Also mention what PDFBox 
version you are using.

If the PDF doesn't have spaces (most PDFs don't), then you won't get any 
positions.

High level PDFBox text extraction (i.e. just get text) creates spaces by 
using heuristics.

Tilman

Am 20.12.2017 um 03:46 schrieb Dan Liu:
> Hello all:
>     I extract the text according to the codes of 
> https://www.tutorialkart.com/pdfbox/how-to-extract-coordinates-or-position-of-characters-in-pdf/

> , but all spaces between english words are lost.
>
> Such as:
> "severe acute respiratory syndrome"
>
> becomes:
> severeacuterespiratorysyndrome
>
> The attachment is origianl text.
>
>
> ------------------
>
> With best regards
> Daniel
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message