pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brian Carrier (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (PDFBOX-43) spaces in extracted text
Date Mon, 23 Feb 2009 22:58:02 GMT

     [ https://issues.apache.org/jira/browse/PDFBOX-43?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Brian Carrier resolved PDFBOX-43.
---------------------------------

    Resolution: Incomplete

I found the original bug report here:

http://sourceforge.net/tracker/index.php?func=detail&aid=1153181&group_id=78314&atid=552832

It does not have any of the files mentioned, so we can't reproduce.

> spaces in extracted text
> ------------------------
>
>                 Key: PDFBOX-43
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-43
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1153181
> Originally submitted by benlitchfield on 2005-02-27 17:45.
> See "Wenjie broken text.pdf"  There are spaces 
> between words.
> Ben
> [comment on SourceForge]
> Originally sent by benlitchfield.
> Logged In: YES 
> user_id=601708
> This issue is fixed for nont standard type3 fonts, 
> which "Wenjie broken text.pdf" uses.
> The extra spaces in the ocalc.pdf is a different problem that is 
> still being looked into.
> Ben
> [comment on SourceForge]
> Originally sent by benlitchfield.
> Logged In: YES 
> user_id=601708
> FYI, This problem is seen with PDFs that use Type3 fonts.  A 
> solution is in the works.
> Ben
> [comment on SourceForge]
> Originally sent by benlitchfield.
> Logged In: YES 
> user_id=601708
> In the ocalc.pdf there are some spacing issues as well
> For example
> "deviat e from the existing textb ooks, I would certain ly make 
> major changes by emphasizing several"
> [comment on SourceForge]
> Originally sent by fuwenjie.
> Logged In: YES 
> user_id=1219597
> I found that it is sometimes happened that the font size is not 
> assigned correctly.  The font size would all be 1.0 in that 
> case.  Under that circumstance, it is seldom happened that 
> the width is not correct either.  In those cases, the width is 
> often less than 1.0 which is obviously impossible.
> A word in the original text may break into serveral parts and 
> the return value of GetY() of each part may not right, causing 
> the characters overlapes with others.
> The incorrect Y value and width may obstruct us in reforming 
> the word according to the Y value and width of each part.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message