pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tilman Hausherr (JIRA)" <j...@apache.org>
Subject [jira] [Created] (PDFBOX-4318) PDFont.encode results change on identical input
Date Sun, 16 Sep 2018 11:31:00 GMT
Tilman Hausherr created PDFBOX-4318:
---------------------------------------

             Summary: PDFont.encode results change on identical input
                 Key: PDFBOX-4318
                 URL: https://issues.apache.org/jira/browse/PDFBOX-4318
             Project: PDFBox
          Issue Type: Bug
    Affects Versions: 2.0.11, 3.0.0 PDFBox
            Reporter: Tilman Hausherr


As reported Daniel Wildschut in the user mailing list:

Hello, we use PDFBox to fill in PDF Forms and stumbled on a potential bug while sanitizing
the input.

{quote}We call PDFont.encode to check beforehand if a given character can be inserted using
the given font.

However we noticed that the results of the method call can change depending on what other
strings have been checked before.

Apparently PDType1Font stores previous results in a codeToBytesMap, which then causes the
unexpected behavior.

I'd say that the key used in "codeToBytesMap.put(code, bytes);" is wrong; you probably want
to use the method parameter "unicode" instead.

I tested 2.0.11, the current 2.0.x branch and the 3.0.x branch and was able to reproduce the
problem with all of them.

Code to reproduce: {quote}


{code:java}
public class PDFBoxEncodeTest
{
    public static void main( final String[] args )
    {
        final PDType1Font font = PDType1Font.HELVETICA_BOLD;
        tryEncode(font, "\u0080");
        tryEncode(font, "€");
        tryEncode(font, "\u0080");
    }

    private static void tryEncode(final PDFont font, final String str) {
        try {
            font.encode(str);
            System.out.println("Character " + str.codePointAt(0) + " can be encoded in Font
" + font);
        } catch (final IOException | IllegalArgumentException e) {
            System.out.println("Character " + str.codePointAt(0) + " cannot be encoded in
Font " + font + ": " + e.getMessage());
        }
    }
}
{code}

{quote}

Expected output:

Character 128 cannot be encoded in Font PDType1Font Helvetica-Bold: U+0080 ('.notdef') is
not available in this font Helvetica-Bold encoding: WinAnsiEncoding
Character 8364 can be encoded in Font PDType1Font Helvetica-Bold
Character 128 cannot be encoded in Font PDType1Font Helvetica-Bold: U+0080 ('.notdef') is
not available in this font Helvetica-Bold encoding: WinAnsiEncoding

Actual output:

Character 128 cannot be encoded in Font PDType1Font Helvetica-Bold: U+0080 ('.notdef') is
not available in this font Helvetica-Bold encoding: WinAnsiEncoding
Character 8364 can be encoded in Font PDType1Font Helvetica-Bold
Character 128 can be encoded in Font PDType1Font Helvetica-Bold 
{quote}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Mime
View raw message