Return-Path: X-Original-To: apmail-pdfbox-users-archive@www.apache.org Delivered-To: apmail-pdfbox-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CF597112CB for ; Mon, 12 May 2014 12:46:35 +0000 (UTC) Received: (qmail 35299 invoked by uid 500); 12 May 2014 12:19:55 -0000 Delivered-To: apmail-pdfbox-users-archive@pdfbox.apache.org Received: (qmail 35277 invoked by uid 500); 12 May 2014 12:19:55 -0000 Mailing-List: contact users-help@pdfbox.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@pdfbox.apache.org Delivered-To: mailing list users@pdfbox.apache.org Received: (qmail 35269 invoked by uid 99); 12 May 2014 12:19:55 -0000 Received: from Unknown (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 May 2014 12:19:55 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of tres.finocchiaro@gmail.com designates 209.85.128.177 as permitted sender) Received: from [209.85.128.177] (HELO mail-ve0-f177.google.com) (209.85.128.177) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 May 2014 12:19:51 +0000 Received: by mail-ve0-f177.google.com with SMTP id db11so8513374veb.36 for ; Mon, 12 May 2014 05:19:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=5jbVvSMqF9IvEgqm3Nui810Ty/Lz/UIRTyh9jHNzKuY=; b=g0BCGRed/sn7xLcbw1wJ6+tZfyyhPLkBcv5OczLZk1tMPKZ49jk7bTcpFXNyyA76Vb DU3dPODcN7OjdUJPCRulIXROvtD5Yp3c7JvQiZqHNqjTNPXW7vRyLqAtT7HfY7ODXL2x ll92325zUk8+6QHSMKQj5iBUT7lG8zsnPMFz5xmjqcnmHShF1zkNf2+cGFw/gEb2AXWJ QYjm1BissHr2cJqRN3HjMoFwQwQKun8XSdzv//oJJOMGkoRD/BwbnpffbhSLL4YhSrSJ htbi2FteLud8BA9Dbk2C+jtt6bhWdh3V8N0T9DgT/QjOe15RWhcn+6LB1eajybTMWvQd bmDA== MIME-Version: 1.0 X-Received: by 10.58.34.143 with SMTP id z15mr309280vei.52.1399897168064; Mon, 12 May 2014 05:19:28 -0700 (PDT) Received: by 10.52.106.135 with HTTP; Mon, 12 May 2014 05:19:28 -0700 (PDT) In-Reply-To: References: <415357948.1051133.1399370181157.open-xchange@ptangptang.store> Date: Mon, 12 May 2014 08:19:28 -0400 Message-ID: Subject: Re: PDF text extraction result different from what they look in PDF reader application From: Tres Finocchiaro To: users@pdfbox.apache.org Cc: =?UTF-8?Q?Andreas_Lehmk=C3=BChler?= Content-Type: multipart/alternative; boundary=089e0122f4b26cb79104f932f531 X-Virus-Checked: Checked by ClamAV on apache.org --089e0122f4b26cb79104f932f531 Content-Type: text/plain; charset=UTF-8 > > Is there a way to prevent this? I mean a way to configure PDFBox not > to extracted the scanned text and get the right displayed text? @Andreas, This may be the bug report to follow. In short, not yet. https://issues.apache.org/jira/browse/PDFBOX-1912 --089e0122f4b26cb79104f932f531--