pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrea Vacondio <andrea.vacon...@gmail.com>
Subject UTF16 encoded string to PDFDocEncoding
Date Mon, 10 Jul 2017 17:22:12 GMT
Hi, we came across this case where we are basically cloning outline items
where the original outline title is a UTF16BE encoded text string
containing the value 00A0 (non break space). We later use the string to
assign the title in a new outline item and the A0 is recognised as a € sign.
Here is a simple test:

        COSString victim = COSString
                .parseHex("FEFF004300680061007000740065007200A0");
        PDOutlineItem node = new PDOutlineItem();
        node.setTitle(victim.getString());

If you look at the node dictionary you'll see that the title value is
Chapter€

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message