Return-Path: Delivered-To: apmail-incubator-pdfbox-dev-archive@minotaur.apache.org Received: (qmail 87805 invoked from network); 23 Feb 2009 22:58:28 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 23 Feb 2009 22:58:28 -0000 Received: (qmail 30172 invoked by uid 500); 23 Feb 2009 22:58:26 -0000 Delivered-To: apmail-incubator-pdfbox-dev-archive@incubator.apache.org Received: (qmail 30160 invoked by uid 500); 23 Feb 2009 22:58:26 -0000 Mailing-List: contact pdfbox-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: pdfbox-dev@incubator.apache.org Delivered-To: mailing list pdfbox-dev@incubator.apache.org Received: (qmail 30122 invoked by uid 99); 23 Feb 2009 22:58:25 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Feb 2009 14:58:25 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Feb 2009 22:58:23 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id AB4D0234C48C for ; Mon, 23 Feb 2009 14:58:02 -0800 (PST) Message-ID: <1219764442.1235429882687.JavaMail.jira@brutus> Date: Mon, 23 Feb 2009 14:58:02 -0800 (PST) From: "Brian Carrier (JIRA)" To: pdfbox-dev@incubator.apache.org Subject: [jira] Resolved: (PDFBOX-43) spaces in extracted text MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/PDFBOX-43?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Carrier resolved PDFBOX-43. --------------------------------- Resolution: Incomplete I found the original bug report here: http://sourceforge.net/tracker/index.php?func=detail&aid=1153181&group_id=78314&atid=552832 It does not have any of the files mentioned, so we can't reproduce. > spaces in extracted text > ------------------------ > > Key: PDFBOX-43 > URL: https://issues.apache.org/jira/browse/PDFBOX-43 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > > [imported from SourceForge] > http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1153181 > Originally submitted by benlitchfield on 2005-02-27 17:45. > See "Wenjie broken text.pdf" There are spaces > between words. > Ben > [comment on SourceForge] > Originally sent by benlitchfield. > Logged In: YES > user_id=601708 > This issue is fixed for nont standard type3 fonts, > which "Wenjie broken text.pdf" uses. > The extra spaces in the ocalc.pdf is a different problem that is > still being looked into. > Ben > [comment on SourceForge] > Originally sent by benlitchfield. > Logged In: YES > user_id=601708 > FYI, This problem is seen with PDFs that use Type3 fonts. A > solution is in the works. > Ben > [comment on SourceForge] > Originally sent by benlitchfield. > Logged In: YES > user_id=601708 > In the ocalc.pdf there are some spacing issues as well > For example > "deviat e from the existing textb ooks, I would certain ly make > major changes by emphasizing several" > [comment on SourceForge] > Originally sent by fuwenjie. > Logged In: YES > user_id=1219597 > I found that it is sometimes happened that the font size is not > assigned correctly. The font size would all be 1.0 in that > case. Under that circumstance, it is seldom happened that > the width is not correct either. In those cases, the width is > often less than 1.0 which is obviously impossible. > A word in the original text may break into serveral parts and > the return value of GetY() of each part may not right, causing > the characters overlapes with others. > The incorrect Y value and width may obstruct us in reforming > the word according to the Y value and width of each part. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.