pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkühler (JIRA) <j...@apache.org>
Subject [jira] [Resolved] (PDFBOX-1001) TextPosition.getHeight() returns erroneous value for some PDFs
Date Tue, 30 Aug 2011 17:30:37 GMT

     [ https://issues.apache.org/jira/browse/PDFBOX-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Andreas Lehmkühler resolved PDFBOX-1001.

       Resolution: Fixed
    Fix Version/s: 1.7.0
         Assignee: Andreas Lehmkühler

I fixed the calculation in revision 1163297 as proposed by Emil.

Thanks for the patch and the example!

> TextPosition.getHeight() returns erroneous value for some PDFs
> --------------------------------------------------------------
>                 Key: PDFBOX-1001
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1001
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.4.0, 1.5.0
>         Environment: Solaris, WinXP
>            Reporter: Emil Wacker
>            Assignee: Andreas Lehmkühler
>             Fix For: 1.7.0
>         Attachments: dreher.pdf
> For a PDF that worked fine under 1.2.1 the height value returned is negative and the
wrong value (i.e. using Math.abs()  won't fix it).  Other PDFs work fine.
> PDF Debug shows "Creator:Crystal Reports"  and "Producer:PDF-XChange (XCPRO30.DLL v3.30.0064)
(Windows 2k)"
> And when examining the 'Stream' items, the text is not what displays.
> Any suggestions on what to look for so that I can do differential analysis against other
PDFs to see what they do/not have in common with this one?
> (It's client data so I can't post the PDF. )
> It's stopping us from moving off 1.2.1  (and later versions fix another issue we have
of seeing question marks instead of the actual characters).

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message