pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkühler (JIRA) <j...@apache.org>
Subject [jira] [Resolved] (PDFBOX-1824) [PATCH] CFF fonts render wrong glyphs
Date Thu, 02 Jan 2014 10:09:51 GMT

     [ https://issues.apache.org/jira/browse/PDFBOX-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Andreas Lehmkühler resolved PDFBOX-1824.

       Resolution: Fixed
    Fix Version/s: 2.0.0
         Assignee: Andreas Lehmkühler

I added the proposed patch(es) in revision 1554779.

Thanks for the contribution!

> [PATCH] CFF fonts render wrong glyphs
> -------------------------------------
>                 Key: PDFBOX-1824
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1824
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>            Reporter: John Hewson
>            Assignee: Andreas Lehmkühler
>              Labels: patch
>             Fix For: 2.0.0
>         Attachments: 1.patch, 2.patch, 3.patch, all.patch, calluna-11.pdf, patched.jpg,
> I've found three very closely related CFF encoding issues in v2.0.0 when using PDFToImage.
> Problem 1
> ---------
> Look a line 7 of the poem, it should be "And the mouldering dust that years have made"
> but instead says "Afld the fioulderiflg dust that years have fiade"
> The CFF font is asseumed to use CIDs but it does not if its not a ROS font.
> Therefore we add a check for CFF ROS class.
> Patch 1 fixes this.
> Problem 2
> ---------
> Look at line 3 "of right shoice" should be "of right choice".
> Likewise on line 2 of the 2nd paragraph "And a staunsh" should be "And a staunch",
> the st and ch ligatures are incorrect.
> This is because the font is an CFF ROS CID Font and the glyphs for the st and ch ligatures
> both have no name. The CFF format achieves this by using SIDs beyond the size of the
> index, which map to .notdef. So there is a unique SID for each glyph, but not a unique
> Unfortuntely, PDFBox assumes that Type 1 fonts have glyphs with unique names, and this
> assumtion appears throughout the codebase. Because a glyph name and a SID perform essentially
> the same role, I recommend a simple solution to the problem: when an SID beyond the size
> the string index is encounteted, instead of mapping it to .notdef it should be mapped
> a new name with the prefix "SID" for example mapping SID 409 to the name "SID409". That
> each glyph will have a unique name, which is what PDFbox assumes.
> Patch 2 fixes this.
> Problem 3
> ---------
> Look at line 2, "That creepeth oÉer ruins old!" the word "o'er" is incorrectly rendered
> as "oÉer". This is because the Encoding entry in the PDF maps code 201 from "Eacute"
in the
> base encoding to "quoteright", but this is being ignored by PDFBox.
> In the CFFGlyph2D constructor PDFBox examines the font's built-in charset. When the name
> "quoteright" is encountered it is looked up in the PDF Encoding (i.e. nameToCode) where
> it is changed to code 201. Thus code 201 is associated with the "quoteright" glyph in
> codeToGlyph map. This is correct. 
> However, later when the "Eacute" glyph is encountered, its built-in charset code is also
> 201 (which is standard) and so the codeToGlyph map entry is overwritten, resulting in
> code 201 being associated with the "Eacute" glyph. 
> The solution is to build the codeToGlyph map in a strict order: first populate it with
> font's built-in charset, then the PDF Encoding overwrites any entries which it defines.
> Patch 3 fixes this (and also replaces patch 2)

This message was sent by Atlassian JIRA

View raw message