Return-Path: X-Original-To: apmail-pdfbox-users-archive@www.apache.org Delivered-To: apmail-pdfbox-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AD18AC41D for ; Sat, 10 Jan 2015 21:48:53 +0000 (UTC) Received: (qmail 52801 invoked by uid 500); 10 Jan 2015 21:48:54 -0000 Delivered-To: apmail-pdfbox-users-archive@pdfbox.apache.org Received: (qmail 52774 invoked by uid 500); 10 Jan 2015 21:48:54 -0000 Mailing-List: contact users-help@pdfbox.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@pdfbox.apache.org Delivered-To: mailing list users@pdfbox.apache.org Received: (qmail 52763 invoked by uid 99); 10 Jan 2015 21:48:54 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 10 Jan 2015 21:48:54 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ray.morris.brisbane@bigpond.com designates 61.9.168.152 as permitted sender) Received: from [61.9.168.152] (HELO nskntmtas06p.mx.bigpond.com) (61.9.168.152) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 10 Jan 2015 21:48:28 +0000 Received: from nskntcmgw08p ([61.9.169.168]) by nskntmtas06p.mx.bigpond.com with ESMTP id <20150110214814.UNQK7536.nskntmtas06p.mx.bigpond.com@nskntcmgw08p> for ; Sat, 10 Jan 2015 21:48:14 +0000 Received: from RayPC ([123.211.141.182]) by nskntcmgw08p with BigPond Outbound id eMoD1p0083wKPLT01MoD57; Sat, 10 Jan 2015 21:48:14 +0000 X-Authority-Analysis: v=2.0 cv=D6DF24tj c=1 sm=1 a=DB7Vzx5G34lQqRFcRT9ULw==:17 a=IkcTkHD0fZMA:10 a=1IlZJK9HAAAA:8 a=mV9VRH-2AAAA:8 a=2rQJEbfln0tLLZDvF9cA:9 a=QEXdDO2ut3YA:10 a=DB7Vzx5G34lQqRFcRT9ULw==:117 Message-ID: From: "Ray Morris" To: References: <9F5B4590-80FD-4D17-8C47-13CF715300AC@nic.be> <54B1280E.6070004@t-online.de> In-Reply-To: <54B1280E.6070004@t-online.de> Subject: Re: Content of pdf moved around Date: Sun, 11 Jan 2015 07:48:13 +1000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="utf-8"; reply-type=original Content-Transfer-Encoding: 8bit X-Priority: 3 X-MSMail-Priority: Normal Importance: Normal X-Mailer: Microsoft Windows Live Mail 16.4.3528.331 X-MimeOLE: Produced By Microsoft MimeOLE V16.4.3528.331 X-Virus-Checked: Checked by ClamAV on apache.org Please unsubscribe ray.morris.brisbane@bigpond.com I briefly had the ambition to teach myself how to maintain bookmarks and XML metadata for sheet music libraries but gave up that idea because of the complexity of PDF files. -----Original Message----- From: Tilman Hausherr Sent: Saturday, January 10, 2015 11:24 PM To: users@pdfbox.apache.org Subject: Re: Content of pdf moved around Hi, The PDF didn't go through (never does), but you can try to use PDFTextStripper.setSortByPosition(). Tilman|* *| Am 10.01.2015 um 14:04 schrieb Renaud Billen: > Hello, > > I have a little issue with the extraction of the text of some pdfs, where > some words are switching order with others.. > > With the pdf attached to this mail, if I use "save as text » from adobe > reader, I get : > > Référence: LIX-673LIX-6737 > > > Nom: The test company > > > Type: > Ouverture: 24/04/2007 > > Titulaire: BD > Resp.: LIX > Co-Resp.: BB > Client > > > > > But with pdfbox I get : > > Référence: LIX-6737 > Nom: The test company > Titulaire: BD > Resp.: LIX > Co-Resp.: BB > Type: > Ouverture: 24/04/2007 > Client > > > Could you tell me if something can be done to solve this problem? > > Thanks, > Renaud > >