pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ray Morris" <ray.morris.brisb...@bigpond.com>
Subject Re: Content of pdf moved around
Date Sat, 10 Jan 2015 21:48:13 GMT
Please unsubscribe ray.morris.brisbane@bigpond.com

I briefly had the ambition to teach myself how to maintain bookmarks and XML 
metadata for sheet music libraries but gave up that idea because of the 
complexity of PDF files.

-----Original Message----- 
From: Tilman Hausherr
Sent: Saturday, January 10, 2015 11:24 PM
To: users@pdfbox.apache.org
Subject: Re: Content of pdf moved around

Hi,

The PDF didn't go through (never does), but you can try to use
PDFTextStripper.setSortByPosition().

Tilman|*
*|
Am 10.01.2015 um 14:04 schrieb Renaud Billen:
> Hello,
>
> I have a little issue with the extraction of the text of some pdfs, where 
> some words are switching order with others..
>
> With the pdf attached to this mail, if I use "save as text » from adobe 
> reader, I get :
>
> Référence: LIX-673LIX-6737
>
>
> Nom: The test company
>
>
> Type:
> Ouverture: 24/04/2007
>
> Titulaire: BD
> Resp.: LIX
> Co-Resp.: BB
> Client
>
>
>
>
> But with pdfbox I get :
>
> Référence: LIX-6737
> Nom: The test company
> Titulaire: BD
> Resp.: LIX
> Co-Resp.: BB
> Type:
> Ouverture: 24/04/2007
> Client
>
>
> Could you tell me if something can be done to solve this problem?
>
> Thanks,
> Renaud
>
>


Mime
View raw message