pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Villu Ruusmann (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PDFBOX-624) Misplaced text
Date Mon, 01 Mar 2010 20:02:05 GMT

     [ https://issues.apache.org/jira/browse/PDFBOX-624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Villu Ruusmann updated PDFBOX-624:

    Fix Version/s: 1.1.0

> Misplaced text
> --------------
>                 Key: PDFBOX-624
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-624
>             Project: PDFBox
>          Issue Type: Bug
>          Components: FontBox, Text extraction, Utilities
>    Affects Versions: 1.0.0
>            Reporter: Villu Ruusmann
>            Priority: Critical
>             Fix For: 1.1.0
>         Attachments: documenta_math-fixed.txt, documenta_math.pdf, documenta_math.txt,
documenta_math_page4-fixed.png, documenta_math_page4.png, FontBox.patch
> Thomas Fischer reported to users@pdfbox.apache.org that org.apache.pdfbox.ExtractText
interchanges typographic ligatures "fi" and "fl". The sample document "documenta_math.pdf"
was created using TeX and AFPL Ghostscript 6.50.
> I used PDFBox 1.0.1-SNAPSHOT to verify this problem. The "fi" ligature behaves correctly
(ie. text extraction yields "finite" and "infinite", not "flnite" and "inflnite"), but the
overall text layout is a complete mess. Please see the PDF text extraction result "documenta_math.txt"
and PDF rendering result "documenta_math_page4.png".
> The cause of the horizontal text misplacement is not yet known. This could affect all
PDF documents which have been created using TeX.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message