pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karcher, Glenn" <gkarc...@sjm.com>
Subject Problem/Question about UTF-16 characters
Date Sun, 13 Oct 2013 20:33:37 GMT

I am having a problem when attempting to output a string containing Unicode characters.  If
the Unicode sequence corresponds to single byte character (e.g., a Registered Trademark symbol,
U+00AE), the character is output correctly.  However, if the character is a 2-byte value (e.g.,
Trademark character(TM), U+2122), the string is generated as UTF-16BE as expected, but the
output file is drawn with the FE and FF BOM characters and the 21, 22 characters as single
byte characters.

Is there something that I need to initialize to properly handle the UTF-16 characters (the
most likely solution)?  Is it a bug in PDFBox?  Is it a quirk in Reader X (least likely since
I have seen the TM character being displayed correctly in other documents)?

Any help and pointers on how to deal with this problem will be greatly appreciated.

I am using PDFBox 1.8.2 and Adobe Reader X (Version 10.1.8) and here is a simple program to
demonstrate the problem:

package example;
import java.io.*;
import org.apache.pdfbox.exceptions.*;
import org.apache.pdfbox.pdmodel.*;
import org.apache.pdfbox.pdmodel.edit.*;
import org.apache.pdfbox.pdmodel.font.*;
public class PDFUnicodeExample
    public static void main(String[] args)
        PDDocument document = null;
            document = new PDDocument();
            PDPage page = new PDPage();
            PDPageContentStream cs = new PDPageContentStream(document, page);
            PDFont font = PDType1Font.HELVETICA;

            cs.setFont(font, 16.0f);
            cs.moveTextPositionByAmount(100, 700);
            cs.drawString("Reg TM \u00AE ");

            cs.setFont(font, 16.0f);
            cs.moveTextPositionByAmount(100, 680);
            cs.drawString("TM \u2122 ");

            document.save("Unicode Example.pdf");
        catch (IOException e)
        catch (COSVisitorException e)

Best regards,
--Glenn Karcher

This communication, including any attachments, may contain information that is proprietary,
privileged, confidential or legally exempt from disclosure. If you are not a named addressee,
you are hereby notified that you are not authorized to read, print, retain a copy of or disseminate
any portion of this communication without the consent of the sender and that doing so may
be unlawful. If you have received this communication in error, please immediately notify the
sender via return e-mail and delete it from your system.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message