pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: Content of pdf moved around
Date Sat, 10 Jan 2015 13:24:30 GMT
Hi,

The PDF didn't go through (never does), but you can try to use 
PDFTextStripper.setSortByPosition().

Tilman|*
*|
Am 10.01.2015 um 14:04 schrieb Renaud Billen:
> Hello,
>
> I have a little issue with the extraction of the text of some pdfs, where some words
are switching order with others..
>
> With the pdf attached to this mail, if I use "save as text » from adobe reader, I get
:
>
> Référence: LIX-673LIX-6737
>
>
> Nom: The test company
>
>
> Type:
> Ouverture: 24/04/2007
>
> Titulaire: BD
> Resp.: LIX
> Co-Resp.: BB
> Client
>
>
>
>
> But with pdfbox I get :
>
> Référence: LIX-6737
> Nom: The test company
> Titulaire: BD
> Resp.: LIX
> Co-Resp.: BB
> Type:
> Ouverture: 24/04/2007
> Client
>
>
> Could you tell me if something can be done to solve this problem?
>
> Thanks,
> Renaud
>
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message