pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkühler (JIRA) <j...@apache.org>
Subject [jira] [Commented] (PDFBOX-1001) TextPosition.getHeight() returns erroneous value for some PDFs
Date Mon, 04 Jul 2011 06:49:22 GMT

    [ https://issues.apache.org/jira/browse/PDFBOX-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13059335#comment-13059335

Andreas Lehmkühler commented on PDFBOX-1001:

Sorry, I missed the comment about the privacy of the pdf in question.

I guess the main problem is, that in most cases the vertical displacement is not the same
as the font height. These values are just mixed up. We have to add a separate value for the
height, which should take the font height (from the font itself) and the different scaling
factors into account.

> TextPosition.getHeight() returns erroneous value for some PDFs
> --------------------------------------------------------------
>                 Key: PDFBOX-1001
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1001
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.4.0, 1.5.0
>         Environment: Solaris, WinXP
>            Reporter: Emil Wacker
> For a PDF that worked fine under 1.2.1 the height value returned is negative and the
wrong value (i.e. using Math.abs()  won't fix it).  Other PDFs work fine.
> PDF Debug shows "Creator:Crystal Reports"  and "Producer:PDF-XChange (XCPRO30.DLL v3.30.0064)
(Windows 2k)"
> And when examining the 'Stream' items, the text is not what displays.
> Any suggestions on what to look for so that I can do differential analysis against other
PDFs to see what they do/not have in common with this one?
> (It's client data so I can't post the PDF. )
> It's stopping us from moving off 1.2.1  (and later versions fix another issue we have
of seeing question marks instead of the actual characters).

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message