Return-Path: X-Original-To: apmail-pdfbox-users-archive@www.apache.org Delivered-To: apmail-pdfbox-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 61EF810D32 for ; Thu, 14 Nov 2013 16:02:39 +0000 (UTC) Received: (qmail 61163 invoked by uid 500); 14 Nov 2013 16:02:38 -0000 Delivered-To: apmail-pdfbox-users-archive@pdfbox.apache.org Received: (qmail 61138 invoked by uid 500); 14 Nov 2013 16:02:35 -0000 Mailing-List: contact users-help@pdfbox.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@pdfbox.apache.org Delivered-To: mailing list users@pdfbox.apache.org Received: (qmail 61126 invoked by uid 99); 14 Nov 2013 16:02:34 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Nov 2013 16:02:34 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of james.mk.green@gmail.com designates 209.85.220.175 as permitted sender) Received: from [209.85.220.175] (HELO mail-vc0-f175.google.com) (209.85.220.175) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Nov 2013 16:02:28 +0000 Received: by mail-vc0-f175.google.com with SMTP id ht17so838789vcb.20 for ; Thu, 14 Nov 2013 08:02:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=RZelY37YUJtRWi3CNPZXDj0kiaxu5MiTH0W/dSjBXas=; b=faYew6IqOLri5FryJwkhW1eY6C0Hwsie53eettffGcEDOHIVJCzF5o+8fDvwzRxiPv dS7gtDWhk48JLhMsex375CKBRAJhY+Q3MWjWgK1YsjoyF/EHRTzNpJC5vYy6efGI2uee sNZ0v8xrtI7m3FwZzfgQ02Unj0B66nqHKG/JF6NXqjuLFaDCtRUmtUwlgTzB/xkGHMQj rlyPqDNDSqTW58xwMbcxgZS3jOxWJGLa8hXVild6SztmIFra5p8gwA7SBz7dSiLJHRtq yq06Qd29dDtiQGEY0k479GOzKPLU+osI787kwVZjB474YtmZiyYdDyX2P3OVi5VAIJy+ wiQw== MIME-Version: 1.0 X-Received: by 10.220.172.129 with SMTP id l1mr118890vcz.75.1384444927955; Thu, 14 Nov 2013 08:02:07 -0800 (PST) Received: by 10.52.184.66 with HTTP; Thu, 14 Nov 2013 08:02:07 -0800 (PST) In-Reply-To: References: Date: Thu, 14 Nov 2013 16:02:07 +0000 Message-ID: Subject: Re: Unable to get text from this pdf - why? From: James Green To: users Content-Type: multipart/alternative; boundary=001a11c2c06a24aad204eb25343d X-Virus-Checked: Checked by ClamAV on apache.org --001a11c2c06a24aad204eb25343d Content-Type: text/plain; charset=ISO-8859-1 Pretty much confirms our thoughts. Regrettably I don't have Acrobat (only Reader) here but we did notice the loss of selectable text. Thanks for your time. On 14 November 2013 15:42, Gilad Denneboom wrote: > It seems that GS converted the text in the file to graphical elements. You > can see it in Acrobat if you open the Contents panel, and you can also see > that the text in the file is not selectable, and therefore can't be > extracted. > You'll need to look for a solution in GS. It has nothing to do with how > PDFBox works, as there's just no text to read in that file. > > > On Thu, Nov 14, 2013 at 4:29 PM, James Green >wrote: > > > This was created via a fairly obtuse means but suffice it say it should > > still work. > > > > https://www.dropbox.com/s/uaq5sqmlf88108p/sample-from-pdf.pdf > > > > This was me creating a document in LibreOffice Writer, exporting that as > a > > pdf then loading the pdf into DocumentViewer (Evince, although Adobe > > Reader) could also be used. This is then printed to a java application > via > > the windows PScript dll where the java app runs the received postscript > > through Ghostscript to get PDF and finally imported into PDFBox. > > > > This used to work a few weeks ago, and we are unsure why it does not now. > > Printing an odt directly from Writer into the Java app works fine. > > > > This is using PDFBox 1.8.2. > > > > Thanks, > > > > James > > > --001a11c2c06a24aad204eb25343d--